Big Data Benchmarks

spark-streaming-cover

Spark Structured Streaming in Practice

Bigstep Solution Architect Andrei Muraru @ the HUG UK & Big Data Analytics London Meetup

How does Spark Structured Streaming work with real-time big data workloads? Here’s a case study presented by Bigstep Solution Architect Andrei Muraru, during the Big Data Week 2016 global festival.

Spark Structured Streaming provides the means to express streaming computations similarly to those deployed on static data. The built-in engine incrementally and continuously updates the final results as streaming data continues to arrive. Andrei’s presentation covers how a real-life implementation of Spark Structured Streaming on top of a Hadoop Cluster is helping a big online retailer to analyze clickstream data and aggregate it with customer history information. Continue Reading

alex-bordei-strata-new-york

Building Data Lakes in the Cloud

Understand why building a data lake in the cloud entails different particularities than building it on premises

Every industry has both proven and potential data lake use cases. With enterprise data warehouses (EDWs) being rendered ever more inefficient when facing new business needs, cloud-based data lakes have been gaining popularity with enterprises looking to cover the technology gap. Cloud data lakes are purpose-built to meet the data management requirements of the evolving enterprise landscape. Continue Reading

hadoop-cover

A Business User’s Guide to Big Data on Hadoop

This webcast will give an overview of deploying Hadoop within the organization as a strategic initiative for business advantage. Rather than viewing distributed databases as an incremental solution to an IT problem, Ioana Hreninciuc, our Commercial Director will look at the bigger picture: what can Hadoop do not for the database administrators but for the organization as a whole?

Register and you will get a complete overview of the Business User’s Guide to Big Data on Hadoop.

Continue Reading

generic-baner

Memory, Big Data, NoSQL and Virtualization

In-memory processing has started to become the norm in large-scale data handling. This is aclose to the metal analysis of highly important but often neglected aspects of memory access times and how it impacts big data and NoSQL technologies.

We cover aspects such as the TLB, the Transparent Huge Pages, the QPI Link, Hyperthreading and the impact of virtualization on high-memory footprint applications. We present benchmarks of various technologies ranging from Cloudera’s Impala to Couchbase and how they are impacted by the underlying hardware.

The key takeaway for the presentation bellow is a better understanding of how to size a cluster, how to choose a cloud provider and an instance type for big data and NoSQL workloads and why not every core or GB of RAM is created equal.

 

If you have any question, let us know in the comments.
Internet-of-things

The Internet of Things: Is Your Business Prepared?

The term Internet of Things, or IoT, has circulated for a while now. Most people consider it just a buzzword, but the reality is that it is here and here to stay. The IoT is real in hospitals, where advanced systems are being used to track and manage patients and medical equipment. It is real in smart homes, smart cars, and a variety of other places. What does the IoT mean for your business? More importantly, what can you do to get your business ready? Continue Reading