Technically Speaking

The Official Bigstep Blog

 

What Separates a Successful Data Lake from an Unsuccessful One?

The data lake is a relative newcomer to the land of data storage, but it's rapidly making a name for itself for several reasons. Data lakes are ideal for organizations that know big data is a huge part of their future, but haven't yet defined how that will work.

Data lakes don't hamstring you like data warehouses and other data storage options tend to because you can store the data in its native format and leave it 'au natural' until you determine a use for it. With inexpensive cloud storage options, data lakes are also quite affordable to set up and maintain. So, how can you construct a data lake that will deliver a hearty return for your time, effort, and money?

The data lake is a relative newcomer to the land of data storage, but it’s rapidly making a name for itself for several reasons. Data lakes are ideal for organizations that know big data is a huge part of their future, but haven’t yet defined how that will work. Data lakes don’t hamstring you like data warehouses and other data storage options tend to because you can store the data in its native format and leave it ‘au natural’ until you determine a use for it. With inexpensive cloud storage options, data lakes are also quite affordable to set up and maintain. So, how can you construct a data lake that will deliver a hearty return for your time, effort, and money?

A Data Lake is an All-or-Nothing Design

If each department attempts to set up their own data lake, what you end up with are a lot of little data ponds that aren’t really useful to anyone. A data lake has to be a coordinated development that the entire organization contributes to and participates in.

One of the most common mistakes that organizations make when attempting to build a data lake is to accidentally construct lots of data ponds instead. Data ponds are what happens when each department tries to set up their own data lake, but the efforts are never completed nor turned into a holistic data storage solution. The data lake should be a complete repository of all of the data from all of the disparate sources, stored in its original format for all to use and enjoy. This means that it takes an organization-wide approach. Either build a complete data lake or keep your data warehouses and silos.

A Data Lake is Not a Complete Data Strategy

Though data lakes prove to be immensely valuable, the data lake can’t be the sum total of your strategy for big data. In other words, you can’t assume that if you set up a data lake it will be utilized to its potential. You have to mandate use and incorporate the data lake into an overall strategy for leveraging big data. For example, define how the data lake will be used by your developers in future applications and establish what systems and sources will feed data into the data lake. Make it clear from the beginning how the data lake fits into your data strategy so that it doesn’t just sit there and rack up storage charges.

Automate Meta Tagging

Without the proper meta tags, the data that goes into the data lake isn’t likely to ever see the light of day again. It then quickly becomes a data swamp. Meta tags should include rich information that fully describes what each piece of data is and where it came from. Also include a way to determine how the data has been used historically. Without clear and complete descriptions, the data simply won’t ever be recalled again, given that data lakes house enormous amounts of unstructured data. Automated and detailed meta tagging is essential in both building and managing a good data lake.

Picture Use Cases for the Data That Goes Into the Lake

You will hear a lot about how the beauty of a data lake is that you don’t have to determine use cases for the data when constructing the data lake. While that is true, you will want to consider at least a few potential use cases for the data in order to set it up so that it will serve your organization’s purposes. This is also a great way to draft arguments to top executives in order to secure funding for the data lake project. When you can illustrate how the data will be useful, it’s a lot easier to get the brass to sign off on the expenses.

Would you like to see how others have leveraged the power of the Full Metal Data Lake? Read our customer stories. Then you can set up an appointment to discuss your data storage needs with the experts at BigStep.

Got a question? Need advice? We're just one click away.
Sharing is caring:TwitterFacebookLinkedinPinterestEmail

Readers also enjoyed:

5 months in the Netcraft’s monthly Top 10. How about that?

We've recently received good news from Netcraft twice: in August we’ve reached 3rd place in their top 10 and, also, it’s our fifth consecutive month in…

Addressing the IT Learning Curve with Bare Metal

The cloud is part of the natural evolution of the internet. Why run software locally when you can use a cloud app that's automatically upgraded and patched,…

Leave a Reply

Your email address will not be published.

* Required fields to post your comments.
Please review our Privacy Notice in order to understand how we process your personal data and what are your rights in this respect.