Big Data Benchmarks

Smooth Run of Operations for Our Retailer Customers on Black Friday

Black Friday is the hottest day of the year for any retailer. Traffic and demand increases exponentially, and peaks in traffic are as high as you can get. Most websites experience difficulties and it hurts both customers and companies. Not our clients though, as graph shows. Continue Reading

Spark Structured Streaming in Practice

Bigstep Solution Architect Andrei Muraru @ the HUG UK & Big Data Analytics London Meetup

How does Spark Structured Streaming work with real-time big data workloads? Here’s a case study presented by Bigstep Solution Architect Andrei Muraru, during the Big Data Week 2016 global festival.

Spark Structured Streaming provides the means to express streaming computations similarly to those deployed on static data. The built-in engine incrementally and continuously updates the final results as streaming data continues to arrive. Andrei’s presentation covers how a real-life implementation of Spark Structured Streaming on top of a Hadoop Cluster is helping a big online retailer to analyze clickstream data and aggregate it with customer history information. Continue Reading

Building Data Lakes in the Cloud

Understand why building a data lake in the cloud entails different particularities than building it on premises

Every industry has both proven and potential data lake use cases. With enterprise data warehouses (EDWs) being rendered ever more inefficient when facing new business needs, cloud-based data lakes have been gaining popularity with enterprises looking to cover the technology gap. Cloud data lakes are purpose-built to meet the data management requirements of the evolving enterprise landscape. Continue Reading

A Business User’s Guide to Big Data on Hadoop

This webcast will give an overview of deploying Hadoop within the organization as a strategic initiative for business advantage. Rather than viewing distributed databases as an incremental solution to an IT problem, Ioana Hreninciuc, our Commercial Director will look at the bigger picture: what can Hadoop do not for the database administrators but for the organization as a whole?

Register and you will get a complete overview of the Business User’s Guide to Big Data on Hadoop.

Continue Reading

Memory, Big Data, NoSQL and Virtualization

In-memory processing has started to become the norm in large-scale data handling. This is aclose to the metal analysis of highly important but often neglected aspects of memory access times and how it impacts big data and NoSQL technologies.

We cover aspects such as the TLB, the Transparent Huge Pages, the QPI Link, Hyperthreading and the impact of virtualization on high-memory footprint applications. We present benchmarks of various technologies ranging from Cloudera’s Impala to Couchbase and how they are impacted by the underlying hardware.

The key takeaway for the presentation bellow is a better understanding of how to size a cluster, how to choose a cloud provider and an instance type for big data and NoSQL workloads and why not every core or GB of RAM is created equal.


If you have any question, let us know in the comments.