In-memory processing has started to become the norm in large-scale data handling. This is aclose to the metal analysis of highly important but often neglected aspects of memory access times and how it impacts big data and NoSQL technologies.
We cover aspects such as the TLB, the Transparent Huge Pages, the QPI Link, Hyperthreading and the impact of virtualization on high-memory footprint applications. We present benchmarks of various technologies ranging from Cloudera’s Impala to Couchbase and how they are impacted by the underlying hardware.
The key takeaway for the presentation bellow is a better understanding of how to size a cluster, how to choose a cloud provider and an instance type for big data and NoSQL workloads and why not every core or GB of RAM is created equal.
People make wrong design decisions all the time. Typical developers are under immense pressure to meet a deadline and they very often just take gut decisions and choose technologies they haven’t worked with before because they sound good on paper. The vice-versa is also often true, senior developers go with proven technologies and their known pitfalls because they simply don’t think other technologies can be better.
There are two ways to get a feel of your environment’s performance profile: Continue Reading
Calin-Andrei Burloiu, Big Data Engineer at antivirus company Avira, and Radu Pastia, Senior Software Developer in the Big Data Team at Orange, are the team behind Couchdoop – a high performance connector for bridging Hadoop and Couchbase.
Calin and Radu ran their CDH + Couchbase setup on the Full Metal Cloud and documented the performance of Couchdoop, when varying environment parameters. These are their findings.
Avira’s large scale applications have a traditional 2-tier architecture:
• Analytical tier built around the Hadoop ecosystem which crunches large amounts of user event logs. We use Cloudera’s Distribution of Hadoop (CDH).
• Real-time tier which exposes web services to almost 100 million users. This tier requires a high performance database, and we decided to use Couchbase, which is known for its sub-millisecond response time.
However, when we tried to integrate the two technologies, Hadoop (CDH) and Couchbase, we soon reached the conclusion that current solutions just created a bottleneck. So we decided to write our own and Couchdoop, a high performance Hadoop connector for Couchbase, was born. Continue Reading
This is the first of a series of performance benchmarks on NoSQL DBs that we plan to share with you. Our goal is to understand the various scaling profiles of distributed database technologies as well as identify environments that provide optimum performance/price. Many of our findings can be applied to on premise infrastructure as well and even some cloud scenarios.
This performance benchmark on Couchbase shows sub-millisecond response times but also a difference between GET/PUT operations and QUERY operations when multiple instances are added to the cluster. We have also tested the Memory-Access-Time sensitivity of Couchbase. Continue Reading