Technically Speaking

The Official Bigstep Blog

 

Learning to Live with (and Overcome) Hadoop's Flaws

When it comes to managing big data, no system can match Hadoop in terms of working with huge data sets comprised of structured data, unstructured data, and a mix of the two. Of course, advertisers have jumped on the Hadoop bandwagon, joined by online travel agencies, energy companies, image processing firms, healthcare, IT security, and many others. Yet Hadoop has some issues that do cause headaches for the companies that embrace this new technology. What are Hadoop's worst flaws? More importantly, how can your organization overcome those flaws to achieve big data success?

When it comes to managing big data, no system can match Hadoop in terms of working with huge data sets comprised of structured data, unstructured data, and a mix of the two. Of course, advertisers have jumped on the Hadoop bandwagon, joined by online travel agencies, energy companies, image processing firms, healthcare, IT security, and many others. Yet Hadoop has some issues that do cause headaches for the companies that embrace this new technology. What are Hadoop’s worst flaws? More importantly, how can your organization overcome those flaws to achieve big data success?

Understanding Hadoop’s Complexity Issues

Using Hadoop can sometimes seem more complex than the infinite relationships among huge sets of data.

The sheer complexity of using Hadoop has caused many companies to avoid it. Not all data comes in a neat little structure, and data formats evolve over time, meaning that solutions which worked yesterday don’t today. However, the upside of overcoming complexity issues include the ability to store and use massive amounts of data cheaply. Companies that previously tossed out tremendously useful data before can now leverage the data for solutions to problems and better business operations.

Overcoming complexity issues is a matter of hiring the right IT and big data analysis professionals to tackle Hadoop and whip it into place. However, this is costly, and sometimes the professionals needed to do the job simply aren’t available. In these cases, the best solution is to acquire a business-ready version of Hadoop from a cloud vendor such as Bigstep. For technology decision makers, Bigstep is the brand of cloud hosting provider that offers the highest-performance public cloud in the world, and it is scalable to use with big data applications.

Understanding Hadoop’s Reliability Issues

Hadoop’s tendency to go kerplunk more than 80 percent of the time works okay for companies like Amazon, which make their money whether Hadoop cooperates or not. But for those companies depending on Hadoop for real-time business operations, less than an 80 percent success rate is unacceptable.

If debugging the problems is too costly and time-consuming (or simply beyond the abilities of your current IT staff), there are some fixes that can eliminate some (but not all) Hadoop crashes.

- Make sure memory size is set properly. Many crashes are the direct result of insufficient memory to handle the tasks.
- Avoid creating tasks that require excessive intermediate results. Trying to handle too many intermediate results often results in a lack of adequate storage space, which in turn leads to failure.
- Avoid requesting 100’s of nodes at a time. Instead, ask for ten or so at a time and add to the requests as requests are completed.
- Avoid common errors in file configuration. Make sure to point to the right buckets, put the correct credentials into place, and double-check for typos before submitting a request.
- Avoid using Hadoop’s default settings. Since these settings are often confusing, play around with the settings to see what speeds and other criteria work best for the type of processing you need to do.

Understanding Hadoop’s Security Issues

Is security a Hadoop problem, or just the world we live in?

Hadoop has gotten a bit of a reputation for being less than secure, but this may be at least a partially unwarranted claim. For example, many servers that aren’t running Hadoop have succumbed to cyber attacks, including those operating onsite and those in the cloud. Instead of giving up on the computing power Hadoop offers, isolate the hardware running Hadoop from outside interference. Alternately, you can use a cloud-based service like Bigstep and let the pros handle the security issues associated with running Hadoop clusters.

The good news is that improvements to Hadoop are being made all the time. Issues that plagued users just a couple of years ago have already been addressed—and solutions are coming along continually. It’s best to begin assimilating Hadoop into daily operations now and to tackle problems as they come along than to wait for your competitors to figure it out and pass you by before your Hadoop adoption is even off the ground.

Got a question? Need advice? We're just one click away.
Sharing is caring:TwitterFacebookLinkedinPinterestEmail

Readers also enjoyed:

Big data in useā€¦retail

The retail industry is a big data pioneer that could be said to have been deploying big data before the term was even coined. It has long used loyalty…

When Data "Goes Dark"

What data is housed in the paper lying around your office? Paper documents, photos, video, or other corporate holdings are the next incremental step in…

Leave a Reply

Your email address will not be published.

* Required fields to post your comments.
Please review our Privacy Notice in order to understand how we process your personal data and what are your rights in this respect.