Seeing clearly through the data lake’s surface with Alation
Data lakes solve the storage problem associated with storing large amounts of data but do little to help business users collaborate and derive insights from the data lake. Wading through the complexity of the data lake can be daunting, resulting in lack of use and loss of trust in the lake.
Self-service analytics in the data lake require that users can trust the data and understand the nuances of its path, from raw files to traditional relational data sets and eventually to BI dashboards and reports.
That’s where a data catalog comes in. By employing a data catalog along with your data lake, you can not only tell whether the data is accurate and relevant, you can also tell whether it is trustworthy (i.e. data that has been certified and validated by a data steward in the organization). While a data catalog can make it easier to find, understand, and establish trust in data, not all data catalogs are created equal.
Alation extends access to business-critical data to a broad set of users across an organization (not just IT or traditional analysts) so that analysts can immediately derive thoughtful insights from their data lake.
Alation addresses the complexity of working on data lakes with an enterprise data catalog which is designed to curate and share context throughout the enterprise. By leveraging artificial intelligence to automatically capture the rich context of enterprise data, Alation surfaces relationships between data sets, analyst usage and relevant insights.
This empowers users to not only find information but also captures tribal knowledge to improve productivity and increase the level of confidence needed in order to trust data. Because of these methods, Alation’s solution provides the enterprise with reassurance that data drawn from a data lake has measurable business worth.
One of the bigger challenges in deploying a successful data lake has been the inability to generate a holistic view of the data pipeline. This becomes more challenging as enterprises move to a hybrid cloud environment, with data stored not just on-premises but also in the cloud.
Alation delivers a transparent view across all data stored in the data lake by means of:
Native integration with traditional relational datasets (Hive) and file systems (Amazon S3 and HDFS).
Integration with data processing engines such as SparkSQL, HiveQL, Presto, and Impala.
Integration with leading data preparation tools (Paxata and Trifacta).
Cataloging business intelligence reports from various vendors such as Tableau and MicroStrategy.
Having broad visibility into the modern data pipeline allows users to discover information in an easily digestible format, creating a single source of reference to minimize the difficulty of managing the data lake.
A data lake has many advantages, but there are inherent challenges that have made it difficult to implement a successful lake deployment. Companies, such as Munich Re and Blue Cross Blue Shield, have realized that Alation’s data catalog is essential to mitigating those challenges.
Those obstacles are overcome by having the ability to share knowledge within the organization, rank and rate queries, and empower users to not only find out what’s inside of the data lake, but also to know if that data is trustworthy, and useful.
“Alation’s social catalog is part of our self-service data analytics platform and already helps more than 600 users to discover data easily and to share knowledge with one another […] we expect [collation] time to go down significantly,” said Wolfgang Hauner, Munich Re’s chief data officer.
To learn more about Alation and schedule a demo, click here.