From data warehouse, to data lake, to data layer, to data fabric: the next wave of tech

The next wave of innovation producing the complete virtual data landscape

by Tech Wire Asia

As the possibilities of the enterprise’s data become apparent, many organizations are rightly investing in the technology that they might use to mine this most valuable of 21st-century resources.

Since the early days of digitization, companies have collected and stored information on every aspect of their operations. Carefully archived and stashed away, initial data repositories were essential for a rollback in the event of IT failure. But as technology became more reliable, the habit of keeping data didn’t leave any but the foolhardiest of institutions. Today, most enterprise-scale organizations have at their disposal multiple data lakes, databases almost without number, and perhaps as many uses for the accrued information as they have bytes stored.

The first wave

Data virtualization technologies emerged early on in this story but really only came to the fore as companies began to leverage storage at scale in the cloud, making use of the lower-cost resources now available to them: relatively cheap, compressed autonomously, available on-demand, and elastic.

Aggregating data into an overall virtualized whole was the first wave of abstraction, with solutions providing a single source of data regardless of its initial source. The emerging field of data science could now utilize almost every scrap of information across the enterprise, even mining those sources that might, at first glance, not seem to yield much value. But it was the business intelligence that enterprises could draw from total aggregation that first started to open decision-makers’ eyes to the inherent capabilities of what they possessed (and were adding to, daily) — the information flows and stores of the business at large.

The second wave

At Tech Wire Asia, we’re looking at the next generation of data virtualization technologies; the second wave of this type of business intelligence toolset. Data fabric providers’ solutions can empower not just the specialist data scientist and researcher, but any member of the workforce capable of constructing a simple SQL query or even building an Excel formula or two.

But beyond the abstraction of disparate data sources into one giant meta-pool, data fabric layers are a coherent, conversational, and dynamic layers of information that can be accessed on-demand and in real-time.

Traditional data management solutions are typically based on data copies: relatively static and needing to be constructed on a per request, per use, or a per-app basis. Data fabric virtualization, conversely, is much more efficient and safer. Definitions sit alongside data sets and queries; the fabric provides a route to the answers, with questions posed by virtual APIs.

Data fabric’s powerful data governance facilities also mean access can be determined at any level of granularity anywhere in the business, from a single cell up to an entire department’s access to country-specific data. For data governance managers, this advantage alone is a godsend.

Data fabric solutions enable users to create sophisticated models (including control logic) to generate what effectively acts as an API onto the information. The API can then be used by (for instance) an Excel sheet, a fully-fledged Business Intelligence suite, a visualization tool, or addressed by any layer of code, bespoke construction or existing application.

There is an unfortunate tendency among data aggregation vendors to simply change the terminology used to describe their platforms — in terms of language, it’s a short hop from data aggregation to data fabric. But here at Tech Wire Asia, we’re examining four vendors whose products represent the true best of this second wave of data fabric solutions.

As ever in these groupings, no single solution will represent the best choice for every enterprise, but we feel that you will find a solution for your data intelligence requirements among the following four companies.

ZETARIS

Zetaris is modernizing and transforming data access, analytics and BI.

Zetaris Data Fabric is the world’s fastest distributed SQL query engine with real-time data quality and governance “built in”. You can execute SQL queries across your data landscape, whilst joining data in your data lakes to your data warehouse, file systems, IOT platforms or cloud databases, to create a single meta data view of your business. This a Hybrid (Cloud) Data Warehouse.

Zetaris achieves instant access to data by separating the query logic and metadata from the raw data storage. This vastly reduces data duplication and processing costs whilst negating the need for complex data pipelining or ETL.

Zetaris makes data projects ten times simpler with project timelines being compressed from months to days. Zetaris TCO calculator has proven a seven times reduction in cost which is the result for medium to complex projects.

Zetaris is used for the following use cases:

Hybrid (Cloud) Data Warehouse
Self-service BI layer
Legacy (data warehouse / data lake / data mart) to cloud migration
Governed Data Sharing

Zetaris is the disruptive, future generation, data platform. Visit www.zetaris.com and sign up for a 30-day free trial.

TIBCO

Companies use TIBCO’s data virtualization and fabric management software to reduce data bottlenecks caused by repetitive collation, cleaning, and querying of information. That leads, simply enough, to better business outcomes.

While data engineers, research teams, and analytics specialists are the route-one data virtualization users, TIBCO’s ideal is that the same functionality (previously only available in the data specialist’s opaque language) becomes spread across all enterprise users. Why aren’t applications developers benefiting, for example?

Data virtualization allows applications, visualization, and BI tools to access and use data regardless of how it’s formatted or where it is physically located — this is step one. By dynamically de-duping, checking, and cleaning data as it moves and shifts in real-time, it can rapidly create reusable data services that deliver data analytics, with even heavy lifting reads completed securely and quickly. That’s step two.

Data services coalesce into a common data layer that supports a wide range of analytic and applications use cases in whichever area of the enterprise that requires it. And in the modern organization, there are no divisions or job roles that have no data component.

Data engineers already have access to the business insights that the decision-makers need. TIBCO is determined that those decisions be better informed, and who better to draw out the insights from data than the decision-makers themselves?

To read more about TIBCO, follow this link.

TALEND

As well as pulling together data from the entire organization’s sources, the Talend Trust Score lets decision-makers get insight into the veracity of the information on which they base their decisions.

The company’s Pipeline Designer lets enterprises plan how their data sets are assembled, with real-time attenuation and control. Data quality is assessed automatically, ensuring that rogue information does not skew overall insight quality.

Talend’s data fabric presents an abstraction of the truly multipurpose data, and the power of real-time data processing is available thanks to the platform’s deep integration with Apache Spark.

User-friendly APIs present data for various use cases (forget your notions of code-heavy API calls and token exchanges), so new projects can be quickly brought up and given access to wide-ranging data resources.

With the means of data ingestion simply constructed for even the most complicated of applications, the results of expensive development will never be tainted by poor-quality data. This gives organizations the type of confidence in what they do digitally, making them more able to compete with less data-savvy competitors. Learn more about this governance-friendly yet eminently usable raft of solutions here.

ATSCALE

Some of the data resources available to the enterprise decision-maker have been historically only available via the “extra loop” of the data specialist: someone trained in and equipped with the tools required to parse and normalize different data types. Time-consuming and expensive, even the most well-funded enterprises only ever produced data that was, by the time of its presentation, out of date.

AtScale hopes to remedy this by providing a data fabric that’s immediately accessible to the end-user without the need for time-consuming waypoints. AtScale even connects natively to that stalwart of business software, Excel, although it also integrates well with just about every business-oriented piece of software, too — wherever that might be hosted.

A self-service analysis culture might be a lofty goal, but AtScale provides the data fabric layer that makes this a reality. There’s no need, even for “citizen data scientists”: the knowledge is right there, so self-service business intelligence can be drawn by any of the enterprise’s functions.

And with market-leading data aggregation technologies underpinning the data fabric, users can be sure of the complete data picture, so decisions are always taken on concrete, empirical facts, not speculation or gut feeling.

To learn more about AtScale, follow this link.

*Some of the companies featured on this article are commercial partners of Tech Wire Asia

BIG DATA

DATA FABRIC

PLATFORMS

SOFTWARE

POPULAR TOPICS