We apologize for being a bit slow to sharing our findings with the internet at large, here we are trying to make up for it now.
We started testing Impala in an effort to understand what hardware setup would provide the best performance/price for it. We didn’t want to see it perform in extreme cases, but in regular situations that most users would encounter. We aimed to provide a quick practical guide for choosing the infrastructure to run Impala on.
With this in mind, we looked at a medium sized deployment of 4 Full Metal Compute Instances, and we scaled the hardware from single CPU, low RAM capacity to dual CPU, high RAM capacity.
We didn’t start out aiming to prove any particular assumption. The purpose of the project was to explore and understand how Impala works with hardware and our findings were quite surprising.
If you have any questions or would like us to do a benchmark on a certain technology, just drop a comment and we’ll get right back to you.