Hadoop not only lives in the big data cloud, it embodies the big data cloud. Owned by Apache, Hadoop turns 11 years old in 2017. This open source software provides distributed cloud storage and the ability to process large disparate datasets into actionable insight. Continue Reading
Self-driving cars. Supermarkets ditching not only the cashiers, but the self-checkout stations too. It doesn’t take long to find a story about job loss these days. Continue Reading
In the grand land of databases, you have the traditional RDBMS (here’s lookin’ at you, SQL) and an impressive lineup of the sexy, modern NoSQLs (say hey to MongoDB, Cassandra, Redis, HBase, and the gang). The trouble with relational databases is and always has been scalability. Darn things just don’t like to grow, and today’s data sets do enjoy GROWING. But RDBMS retrieve data like nobody’s business. Conversely, NoSQL databases are ACID-less. Continue Reading
MongoDB is one of dozens of NoSQL databases that are gradually taking over for relational databases as big data enters the world of business. Relational databases just can’t handle all the unstructured data required for modern data analytics — and NoSQL alternatives have lined up with offerings including MongoDB, Couchbase, HBase, Cassandra, and more recently, the likes of Flink, and other big data tools. Continue Reading
Bigstep Solution Architect Andrei Muraru @ the HUG UK & Big Data Analytics London Meetup
How does Spark Structured Streaming work with real-time big data workloads? Here’s a case study presented by Bigstep Solution Architect Andrei Muraru, during the Big Data Week 2016 global festival.
Spark Structured Streaming provides the means to express streaming computations similarly to those deployed on static data. The built-in engine incrementally and continuously updates the final results as streaming data continues to arrive. Andrei’s presentation covers how a real-life implementation of Spark Structured Streaming on top of a Hadoop Cluster is helping a big online retailer to analyze clickstream data and aggregate it with customer history information. Continue Reading