Once limited to internet giants like Google, Hadoop is moving into the business mainstream, allowing businesses to ingest and analyze massive quantities of structured and unstructured data. To realize its full data crunching capacity, Hadoop needs powerful infrastructure, and most companies don’t have the hardware necessary to set up Hadoop clusters on their premises.
But now providers offer tools that let businesses use Hadoop in the cloud. This is terrific for use case scenarios and for businesses where the ingestion and processing of data are unpredictable or intermittent. However, with Hadoop, input / output (I/O) demands are heavy, and virtualization tools can slow I/O down considerably. Hadoop is a great enabler, but for maximum performance, it needs to be run on bare metal rather than in a virtual environment.
Hadoop in the Cloud
Hadoop is open source and runs on commodity infrastructure, and it’s very flexible in its approach to data analytics. But in a cloud environment, end-users lack the control over resources necessary to derive maximum performance from Hadoop. Though built around the concept of keeping compute and storage together, when run in the cloud, Hadoop architecture separates compute and storage. For example, on AWS, the storage layer, S3, is separate from the compute layer, EC2.
With bare metal computing, compute and storage are together. Moreover, there’s no hypervisor, so there are no extra layers between application and hardware. That makes processing considerably faster. Bare metal processing can also offer significantly more connectivity at the machine level than you get with even the most advanced cloud architecture. Bare metal computing offers the lowest network latency possible, and allows you to connect your high network capacity to storage, which itself can be spread out among multiple SSD storage devices, preventing bottlenecks that can result from using a single central storage device.
Effects of Virtualization on Performance
Software developer Peter Senna Tschudin conducted a benchmark study over a number of virtualization solutions like VMWare, XEN, and HyperV, and found that the performance overhead of virtualization can double disk latency, and slow network I/O by one-quarter. For a one-off data processing scenario, this may not be a problem, but in a big data environment, the slowdown in performance could lead to significant wasted resources and higher operational costs. Furthermore, virtualization overhead varies significantly depending on utilization. A query could take 10 milliseconds or 20 milliseconds, depending on resource utilization, for example.
Hubspot CIO Jim O’Neill demonstrated how virtualization on top of OpenStack versus OpenStack with a private bare metal architecture resulted in a four-fold difference in efficiency. With big data analyses involving sequences of queries, cloud overhead from disk latency and network I/O can really add up. But with bare metal computing, you don’t have these problems.
Bare Metal, Performance, and Security
In addition to performance trade-offs that come with virtualization on top of Hadoop deployments, end-users must also consider security. While data encryption is the cornerstone of security, bare metal computing also offers physical isolation. With no hypervisor and no other users on the same servers or in the same management platform, compute instances are physically isolated, and there is no danger of outside interference in compute instances, because the machines do not share resources or applications.
Bare Metal Infrastructure as a Service Incorporates Cloud Flexibility
When bare metal infrastructure is provided as a service, end-users get the flexibility and scalability of the cloud, along with the performance of bare metal computing. End-users also gain more control than
in a typical cloud environment. For example, you may want to ensure all your machines are in the same rack, connected in the same switch, and you can do this with bare metal infrastructure as a service. Plus, with leading bare metal providers, you can know the specs of the actual hardware running your processes rather than trusting processes to “mystery metal” that could be outdated. Imagine 40 GbE network transfer speed with the convenience of the cloud!
Hadoop is the great enabler for capturing, storing, transferring, and analyzing big data, and today you can provision Hadoop in the cloud. This is terrific for businesses without the resources for dedicated, on-premises hardware. But the very things that make the cloud convenient can drag Hadoop performance down. The solution to this problem is bare metal computing, which lets Hadoop blaze through processes unencumbered by hypervisors and less-than-optimal connectivity. There’s simply no better solution for resource-intensive use cases than bare metal infrastructure provided as a service. You get the speed of bare metal computing and the convenience of the cloud, sacrificing neither performance nor convenience.