Big Data Technologies

Hadoop in 2017: Bigger, Better, Faster?


Hadoop will stand strong in 2017.

Hadoop not only lives in the big data cloud, it embodies the big data cloud. Owned by Apache, Hadoop turns 11 years old in 2017. This open source software provides distributed cloud storage and the ability to process large disparate datasets into actionable insight. Continue Reading

Is Splice Machine a Viable Option for Your Hadoop SQL Database?

In the grand land of databases, you have the traditional RDBMS (here’s lookin’ at you, SQL) and an impressive lineup of the sexy, modern NoSQLs (say hey to MongoDB, Cassandra, Redis, HBase, and the gang). The trouble with relational databases is and always has been scalability. Darn things just don’t like to grow, and today’s data sets do enjoy GROWING. But RDBMS retrieve data like nobody’s business. Conversely, NoSQL databases are ACID-less. Continue Reading

MongoDB: The Freaky Patchwork Quilt of the Database World

MongoDB is one of dozens of NoSQL databases that are gradually taking over for relational databases as big data enters the world of business. Relational databases just can’t handle all the unstructured data required for modern data analytics — and NoSQL alternatives have lined up with offerings including MongoDB, Couchbase, HBase, Cassandra, and more recently, the likes of Flink, and other big data tools. Continue Reading


Spark Structured Streaming in Practice

Bigstep Solution Architect Andrei Muraru @ the HUG UK & Big Data Analytics London Meetup

How does Spark Structured Streaming work with real-time big data workloads? Here’s a case study presented by Bigstep Solution Architect Andrei Muraru, during the Big Data Week 2016 global festival.

Spark Structured Streaming provides the means to express streaming computations similarly to those deployed on static data. The built-in engine incrementally and continuously updates the final results as streaming data continues to arrive. Andrei’s presentation covers how a real-life implementation of Spark Structured Streaming on top of a Hadoop Cluster is helping a big online retailer to analyze clickstream data and aggregate it with customer history information. Continue Reading