Calin-Andrei Burloiu, Big Data Engineer at antivirus company Avira, and Radu Pastia, Senior Software Developer in the Big Data Team at Orange, are the team behind Couchdoop – a high performance connector for bridging Hadoop and Couchbase.
Calin and Radu ran their CDH + Couchbase setup on the Full Metal Cloud and documented the performance of Couchdoop, when varying environment parameters. These are their findings.
Avira’s large scale applications have a traditional 2-tier architecture:
• Analytical tier built around the Hadoop ecosystem which crunches large amounts of user event logs. We use Cloudera’s Distribution of Hadoop (CDH).
• Real-time tier which exposes web services to almost 100 million users. This tier requires a high performance database, and we decided to use Couchbase, which is known for its sub-millisecond response time.
However, when we tried to integrate the two technologies, Hadoop (CDH) and Couchbase, we soon reached the conclusion that current solutions just created a bottleneck. So we decided to write our own and Couchdoop, a high performance Hadoop connector for Couchbase, was born. Continue Reading
This presentation was made during the Couchbase London Meetup.
Our Product Manager, Alex made a performance benchmark on Couchbase which shows sub-millisecond response times but also a difference between GET/PUT operations and QUERY operations when multiple instances are added to the cluster.
Next week we will be presenting in a very interesting meetup, where we will talk about Couchbase and Hadoop: how to quickly move data from one to the other and how to get sub-milliseconds response time with Couchbase.
The meetup will take place on the 17th of September and it will feature speakers from Avira as well.
Avira is a worldwide leading supplier of security solutions for professional and private use. They have been using Couchbase and CDH in production for two years, working on customer behavior analysis and exploring how machine learning can improve their clients’ experience.
As usage of Hadoop has broadened, choosing the right DB technology and deployment platform represents key factors for the performance of your data analysis setup.
But how do you decide which way to go? SQL or NoSQL? Cloud or on premise? Virtualised or bare metal? We’ve decided to answer these questions, so we worked with Exasol and Couchbase to find the best performance/price setup for both SQL and NoSQL on Hadoop. We will be presenting our findings at the next Hadoop Users Group Meetup on the 16th of September.
Avira will also be joining us and share best practices on how to engineer connectors between systems, in order to maximize performance. Their own Couchdoop was built to link their CDH cluster with Couchbase in production.
Then, Alex, our Product Manager will follow-up with a few tips & tricks on how to improve performance in just about any existing infrastructure by as much as 60%. Yes, it can be done 🙂
This is the first of a series of performance benchmarks on NoSQL DBs that we plan to share with you. Our goal is to understand the various scaling profiles of distributed database technologies as well as identify environments that provide optimum performance/price. Many of our findings can be applied to on premise infrastructure as well and even some cloud scenarios.
This performance benchmark on Couchbase shows sub-millisecond response times but also a difference between GET/PUT operations and QUERY operations when multiple instances are added to the cluster. We have also tested the Memory-Access-Time sensitivity of Couchbase. Continue Reading