What is the difference between a Warehouse date and a Lake date?

Asked

Viewed 140 times

5

Considering the concepts of data Warehouse and of data Lake, what notable differences we can cite between them?

  • I’ll wait for the opinion of other people, but for me it seems not to be in site scope

  • Bacco, as there is a tag "business-intelligence" and this is an objective question on the subject I believe is valid. Reading the latest Bullet point of the site scope, we find: "theoretical doubts about concepts and practices applied to software development". This question is pertinent for guiding who wants to develop each of these two things, right? You do a Warehouse data how? And a Lake data? But what’s the difference after all?

  • 2

    Tags do not define scope, they are mere organization. Ubuntu is tag, and even so questions about OS are accepted. The site is exclusively about programming and related tools (related to their use for programming). For example, a question about an Excel formula fits, but how to format page in Excel does not. But if you can elaborate by focusing the discussion on the scope, perhaps you can address the issue.

  • 2

    Anyway, I understand if you think it’s not part of the scope. Perhaps I have to improve the way of asking to be able to make clear that it is related to the development of these two resources.

  • https://martinfowler.com/bliki/DataLake.html

  • 2

    @Piovezan , so the difference is that Warehouse goes through a treatment zone and Lake is more like Tietê?

Show 1 more comment

1 answer

3


I will dare a brief reply, since no one has yet spoken:

The Data Lake was a concept created to solve problems which the Data Warehouse had difficulties in solving, which are the V’s from Big Data, I will consider here in the reply what I consider to be the 2 major differences:

Volume:

Today with the high data volume which is produced, it would be very expensive to store all this information on DW, whereas they mostly are built on top of Sgdbs.

Thus Data Lake has a proposal to store the data on Hardware Barato, reducing information storage costs.

In fact, there are even some architectures of Data Lake that have a zone of "Archive", where data that ended their life cycle in DW are moved there.

Variety:

With the advent of new unstructured data sources(photos, pdfs, audios, videos, etc.) and ways of analysing this data(Machine Learning, Deep Learning), companies have felt the need to store this information, but this information mentioned above cannot or should not be stored in Dbms.

Thus, we have Data Lakes built into File Directories, which allow the inclusion of any data source.

Completion:

There are other differences, but the answer has already gotten too big for the OS, to finish, today most companies use both concepts in their data architecture, where these technologies complement each other.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.