We’ve all seen the marketing hype surrounding the data lake. Data lakes are much like Michael Corleone at the end of The Godfather. Data lakes will answer all your questions and solve all your problems. However, as with Michael’s pronouncement(s) at the end of The Godfather, there is a downside to this “offer” that marketers may think we cannot refuse. There is usually a set of stakeholders out there who are unfamiliar with Hadoop or the concept of a data lake or perhaps just not interested in changing the status quo of their organizations.
As a data architecture, you are pitching a data lake like you do one of those mountain lakes on travel websites or George Clooney movies … lakes are cool, clear, and usually have the reflection of a snow-tipped mountain peak on their surface to show the purity of the contents within. Everyone wants to drink water from this source. However, when some people hear the concept — data from many sources being stored without a schema for some possible future benefit — they will think more about the concept of a data swamp rather than a pristine data lake.