Fabrication of Business Insights: Lakes, Fabrics, Queries and Streams
The actual drawing of insights from data in business settings doesn’t occupy large amounts of the limited time data professionals have in a working day. While the outcomes from data gathering, parsing, processing, and presentation can be literally game-changing for many organizations, time and resources are most often spent working through the difficulties that are typically present in most data-related situations.
The common stumbling blocks (read: resource drains) differ wildly from situation to situation but usually comprise one or a combination of the following:
– Data locations. Most companies use various on-premise facilities, remote cloud provisions, archive locations, and real-time data feeds. Even auditing for their existence tends to fall to the data scientist — a debatable use of well-trained (if not well-paid) resources in most cases.
– Unstructured and structured data. Technologies like NLP and neural networks can be leveraged to create organized and unified data models, but these processes have inherent problems, especially for real-time data analysis.
– Data format differences. Using a broad range of integration tools can help resolve differences and create coherent schema, but integration platforms rarely cover all the bases.
– Scalability of resources. Virtual lab-based work may be effective for trial data sets but can run into operational difficulties at scale when deployed in production.
A few years ago, addressing these concerns usually involved data lake creation to allow skilled professionals to access and analyze information. Some solutions were highly effective, especially in the auditing and cataloging of resources. But there were concerns about duplication, and for real-time data processing and stream-based operations, data lakes are often ineffective without additional assets and resources.
Therefore data fabrics were the next logical step: a layer of abstraction that adapts in real-time to the underlying structure of the environment. As the arbiter of sources and applications, data fabric helps improve data quality and oversight for all uses and can respond to different needs from the far reaches of the enterprise.
Today’s leading data fabric providers offer fast and standardized connections with common tools and bespoke applications, handling those issues (some of which we touched on above) that otherwise would drain away finite resources spent on basic prep and parsing.
Fabrics also offer a greater range of user types access to information quickly: while per se not producing insightful results with next-to-no effort or expertise, they can offer a range and style of data sources that might not otherwise have been available. These are addressable via standard query languages and can be directed at data anywhere in the distributed enterprise — public clouds, on-premise, and private cloud.
Data fabric solutions allow fast and real-time ingestion from multiple sources (like time-sensitive IoT via MQTT) and can arbitrate simultaneous ingestion from disparate sources like Kafka sinks to industrial devices talking via legacy protocols.
At present, data fabric technologies are most commonly found in the medical and financial sectors, although as the possibilities become more widely recognized (and more sectors digitize operations), this picture is bound to change.
We predict that the next two big growth areas for the technology are B2C retail (because of the high amounts of parsable data, especially in marketplace operations) and manufacturing units with high numbers of SKUs or highly mechanized production lines. As is often the case, new sector penetration for data fabric technologies may well occur first in the Finance Departments of these industries — where unstructured data and repetitive tasks are the norm
However, predictions aside, if your organization is data-rich and insight-poor, we recommend you consider one of the following providers of the expertise you will need to properly utilize your digital assets. Whether your information resources are largely held in static archives and silos or comprise real-time data ingestion from multiple sources, the key to unlocking the value may be held by one of the following.
Read the following round-ups and follow the links: your journey starts here.
The InterSystems IRIS data platform forms the basis of the InterSystems smart data fabric, which is optimised for incredibly performant data ingest and processing — much of which can take place on the fabric itself. A cloud-first data platform for building high-performance, machine learning-enabled applications that connect data and application silos, it puts the “smart” in smart data fabric.
It breaks down the data and application silos on-premise and across public clouds and presents a unified abstraction of the full information resource. It significantly outperforms its competitive alternatives, including entirely in-memory databases, but unlike the latter, it is cost-effective at scale and is reliable both for data persistence and availability/uptime. By presenting the enterprise with a canonical view of its complete data asset, it offers multiple business areas access to real-time information, very high-end querying rates and complex processing simultaneously. That presents “a real competitive advantage” according to Gartner’s 2020 Peer Insights Customers’ Choice Award feedback.
The InterSystems IRIS data fabric simplifies existing architectures, integrates with legacy stack, provides analytics, supports resource-efficient near-real and real-time applications, and deploys flexibly.
We take a much deeper dive into InterSystems and the IRIS data platform right here on Tech Wire Asia and look at the specifics of the solution’s speed, capabilities, possibilities, and power. Click through to read more.
Snowflake is slightly different from some of the vendors featured here as it operates on a more centralized, gathered data (AKA data lake) basis, rather than a middleware between query and data. However, as an industry-recognized powerhouse in this sector, we thought its inclusion valid.
The Snowflake platform runs as-a-service, with the cloud-based provision offering — in one platform — both scalable compute, data storage, and client/query handling. Compute operations are divided and load-balanced on the fly, making it ideal for processes with inherent peaks.
Handling burst demand seamlessly strictly within its own cloud provisions means that it’s a popular choice in settings where strict adherence to data governance is paramount, and working with Snowflake is, for many data scientists, a standard part of the working day.
The data core of any organization can be replicated on the fly for business continuity assurances, which makes it a good solution for any service operating under SLAs.
With granular scalability of workloads, Snowflake is ideal for businesses looking for data processing as-a-service.
There is a slight shift in emphasis and expertise with Qlik — it’s a provider first and foremost of actionable business insights from data assets.
The nature of those assets might differ wildly across any large enterprise, from real-time streams through to archives. Qlik offers the creation of scalable data warehouses (thanks to multi-cloud capabilities), managed data lakes, and data analysis — the latter is where the majority of value is derived.
Qlik’s platform creates catalogs of data on the fly (including those deriving from streams) and then allows users from all parts of the enterprise the ability to query, analyze and present information according to purpose.
Data sets are effectively subsets of the lake designed for specific uses, yet Qlik’s backend continually updates and refines the information, parsing, cleaning, and processing seamlessly on the fly.
Analytic functions are supplemented by AI routines, with the company terming this an “associative information engine,” one that self-improves and learns from human interaction. To learn more, head here to watch a demo or read the full profile here.
The traditional sphere for advanced data-crunching platforms is the enterprise-scale organization, but Tibco also has a range of tools (Tibco Spotfire and TIBCO Cloud) spun off for smaller businesses.
The TIBCO Connected Intelligence technologies comprise multiple modules for the larger end of the market, including TIBCO ActiveMatrix BPM software, Spotfire visual analytics, and Jaspersoft embedded analytics. For integrations in general, there is BusinessWorks and TIBCO Messaging software, and for business rules management, TIBCO BusinessEvents event processing.
Companies use these resources to build both internal and external-facing solutions: providing customers with the real-time information they need or internal staff with operational data via an attractive presentation layer.
It’s worth noting that BI (business intelligence) today means more than a fancy graph or two — TIBCO’s data outcomes are available for query using APIs or standard SQL code, making it the perfect solution for use in both DevOps settings and around the boardroom table.
*Some of the companies featured on this article are commercial partners of Tech Wire Asia