The Data Lake vs Data Warehouse debate might seem fresh, but in reality it’s simply a permeation of the Hadoop vs Data Warehouse argument. This is because Hadoop architecture is commonly used to build Data Lakes. Having a basic understanding of these technologies can positively alter the way in which information is stored and consumed in your organization.
To start with, let’s break down what a ‘Data Lake’ is: While traditionally, enterprise data warehouses (EDW) are where data is stored in a relational system, Data Lakes allow for storage of ANY type of data (structured or unstructured) in its native formats in a single pool.It’s a conceptual shift in approach to storing enterprise data.
When data is stored in native format–instead of forcing it to convert to a standardized schema (as it is in EDW)–it becomes exponentially easier to analyze. By using a Data Lake, and keeping your data unmodified, you are able to save it for future analysis, something that’s not guaranteed with pre-modeled data in EDW.
Schema conversion in a Data Lake occurs at the time of extraction, not in storage. This way data can be used on a case-by-case basis, modeling it on specific local contexts–something which the central standardization of EDW won’t allow for.
On top of this, Data Lakes are cost effective. They are based on distributed big data processing, and use broadly accepted open software standards.
If you’re intrigued by the concept of Data Lakes, schedule a meeting with us to understand how this type of storage may benefit you.