Technically Speaking

The Official Bigstep Blog

 

5 Essential Tips for the Hadoop Ecosystem You Must Know Before 2017

You sat on the sidelines, anxiously awaiting the play caller's decision. Is big data and data analytics the way to score a touchdown, or is the call still under review? After further review, the decision on the field stands ... the Hadoop ecosystem features all the X's and O's you need for solid BI, marketing data, or any other purpose you have for big data and data analytics. What other insider tips and tricks do you need to score the extra points? We're so glad you asked ...

You sat on the sidelines, anxiously awaiting the play caller’s decision. Is big data and data analytics the way to score a touchdown, or is the call still under review? After further review, the decision on the field stands ... the Hadoop ecosystem features all the X’s and O’s you need for solid BI, marketing data, or any other purpose you have for big data and data analytics. What other insider tips and tricks do you need to score the extra points? We’re so glad you asked ...

1. Hadoop & Spark: Where There’s Smoke There’s Fire

Spark’s roaring fast, alright. It’s just still a little rough around the edges.

Like Hadoop, Spark isn’t yet perfected. It’s kind of like the redshirt freshman who’s doing way better than the coaches predicted, but still has some practice ahead. Still, both Cloudera and Hortonworks are convinced Spark will succeed, so placing it in at quarterback is a solid decision, coach. Spark’s strongest arm is streaming, so use it for all your real-time plays.

2. Hive’s Sting is Excruciating, Yet Liberating

Hive is painfully slow, like a linebacker just back from Thanksgiving at Grandma’s. But it handily converts your SQL into MapReduce jobs, and can be swapped to use Tex, which does speed it up a notch. In its defense, Hive is straightforward when it comes to utilizing whatever SQL charting tool you prefer, and it plays nicely with other Hadoop ecosystem MVPs, like Phoenix and Impala.

3. Learn to Love Hating Kerberos

Kerebos is like your favorite team’s main rival. You just love to hate it.

Just like your favorite team’s primary rival, Kerberos is the red-headed stepchild of the Hadoop ecosystem, the one you love to hate. Kerberos is a network authentication protocol, which is painfully difficult, but does deliver a powerful QB sack when integrating with Active Directory. For some pre-game salve to make it a bit easier and more palatable, queue it up with a tool like Ranger or Sentry.

4. Learn to Love Loving Kafka

Another must-have tool if your goal is real-time analytics, Kafka is a newer player, drafted directly out of the Apache program. It’s easy to use, and while it may lack some of the savvy finesse of more sophisticated players, it’s a powerhouse for building data pipelines and streaming applications. It’s also scalable horizontally, fault tolerant, and blindingly fast.

5. That’ll About Do It, Pig

Have you ever noticed how few bovines are represented among the mascots of football teams? Well, Pig is also slated for being cut from the team when it comes to the Hadoop ecosystem. While there are still a fair number of teams utilizing Pig, it just isn’t as easy as some of the alternatives like PL/SQL. Spark is a lot speedier out of the pocket, and is a lot more flexible in terms of use cases. If you’re into Pig, there’s no foul on the play, but if you’re just scrambling up a Hadoop operation, it’s better to rotate another player into formation, such as Nifi (another Apache recruit) or Kettle.

If big data’s your game, Hadoop’s your name. Find out what the team of Hadoop players can add to your operations and see our full line of products to support your big data plays today!

Got a question? Need advice? We're just one click away.
Sharing is caring:TwitterFacebookLinkedinPinterestEmail

Readers also enjoyed:

4 Big Goofs to Avoid When Creating Your Data Lake

If you're in the position of managing organizational data, you've probably heard about the concept of data lakes. While data lakes are marked by their…

Is a Data Lake the Better Solution to Your Data Warehousing Issues?

The good old data warehouse has serviced business admirably for decades. Generally structured as a relational database, it is the go-to data resource…

Leave a Reply

Your email address will not be published.

* Required fields to post your comments.
Please review our Privacy Notice in order to understand how we process your personal data and what are your rights in this respect.