The good old data warehouse has serviced business admirably for decades. Generally structured as a relational database, it is the go-to data resource for all levels of users from the tech-savvy IT folks to the technologically challenged users who just need a quick query for their regular workday. Since the late 1980’s, this is basically all the business needed.
Today is a different situation. With the advancements in data analytics, all data from all streams, stored in its raw natural format, has a purpose for the organization. Well, eventually it has a purpose; that purpose is not always readily apparent for some time after the data is generated. Hence, many organizations have discovered the power of the data lake.
What is a Data Lake?
A data lake is defined by TechTarget as, “A … storage repository that holds a vast amount of raw data in its native format until it is needed.” It is best understood in how it is different from the typical data warehouse. Hence:
• While the data warehouse stores only data that has been structured and added to the relational database, the data lake stores all data from all sources in their original format. The data lake includes historical data and real-time data.
• Data is generally added to the data warehouse only after is purpose has been defined. Data is added to the data lake when it is generated, whether it has a determined purpose yet or not.
• Data warehouses are usually associated only with relational databases, while data lakes are usually associated with Hadoop, but in actuality, you can use Hadoop with relational databases, as well.
• Data warehouses are usually useful to most users, but the highest-level users often have to go to the source systems to pull all of the data they need for high-level analysis and insight. Data lakes, however, contain all of the data needed for all the users, because it holds the raw data streamed from all of the data sources.
Do You Need a Data Lake?
As you can see, the data lake is not necessarily something you get instead of the data warehouse. A data lake can be built in addition to your data warehouse. The benefit of adding a data lake to your business’ data storage repository is the ability to leverage huge unstructured data streams, such as social media feeds, clickstream data, machine logs, data from various sensors, exports from various software solutions like CRM and ERP packages, exports from RDBMS and/or NoSQL databases, and other ‘big data’ streams.
If your business uses or plans to leverage one or more of these big data streams, then a data lake will meet your needs much better than the typical EDW (enterprise data warehouse). The first Data Lake as a Service is here to meet those needs!
Limited offer! Discover the first Full Metal Data Lake as a Service in the world. Get 1TB free for life – limited to 100 applicants. Start here.