Unlocking the real value of data
Factories since the first days of the industrial revolution up to the present day may have changed in appearance, but the underlying principles have always been the same: take raw materials, refine and alter them, and produce something valuable to sell.
Latter-day factories are data centers. Rows of gleaming storage devices and processing units, fed by power lines and cooled by grid-draining fans, calling up raw materials, processing them and producing things or services of value.
The new commodity of commerce is data. Like oil, or electricity on which the data relies, the flow, processing storage and integrity of zeroes and ones (each represented by a slight change in voltage) is the new blood which powers whole economies.
Predictions as to the amount of data which we will have access to in a few years vary on which source of prescience we turn to. Some put the total data sea at close to 200 zettabytes in the next decade, an amount of data so gigantic it’s as difficult to imagine as the size of the universe itself. And as Douglas Adams, science fiction writer once said in his trademark jocular manner:
“Space is big. You just won’t believe how vastly, hugely, mind-bogglingly big it is. I mean, you may think it’s a long way down the road to the chemist’s, but that’s just peanuts to space.”
Both the overall quantity and quality of the data we collate is increasing. A few years ago, before the 1990s saw the rise of the internet age, databases held little other than names, addresses, gender, and potentially, a skeleton history of purchases made at individual businesses.
Data has been hugely expanded to include almost every aspect of our online histories, broadening to full financial records, photos, videos, social comments, likes (and dislikes), shares, locations, travel records, and so on.
Add to that melting pot the exponential amounts of data to be collected by the next ground troops of the technological revolution; the internet of things (IoT). Soon, the global data ocean will be measured not in zettabytes but yottabytes – that’s multiples of 100,000,000,000 bytes of data, where each byte is a single character.
Data for sale?
What use is all this data, however, if we do not use it? Since Google’s initial monetization of its massive data stores by selling meaningful statistics to advertisers, the expected inter-company data markets are perhaps not the frenetic trading floors we might have imagined.
The data giants do not sell their data wholesale. Instead, they use their data themselves – it appears to be more profitable to launch a new, paid-for service than to sell the data to third parties. After all, the data collection companies are in the best position to carefully attenuate collection methods to maximize the efficacy of their commercial services.
Companies (and it is the private sector which collects the majority of data) are now using data to power real-time translation services, image and voice recognition, bio-authentication methods, and to determine personality traits, psychological profiling and, controversially, voting intentions.
And while data may be released on a limited level to third parties (for the purposes of “psychological profiling“), the vast majority of data is kept safely locked away, making the data giants both monetarily and data-rich.
(A notable exception to the private sector’s silo-ing of data appears to be the Chinese government, whose insistence that domestic companies base their data centers inside China may be a clue as to the new-style data wealth the Chinese authorities are creating.)
The covetousnous of the new data giants brings us neatly full circle to the opening premise of this article; namely, that the modern factory stores, processes and repurposes not raw materials, but raw data.
And like factory owners in the past, the raw materials of commerce are and were carefully protected and valued nearly as highly as goods leaving the factory gates. One thinks of the sugar warehouses of 18th century Amsterdam and Batavia, each strongroom keeping out unwanted thieves, but also protecting the raw materials from contamination and spoiling.
Likewise, the data repository of today, however small, is only of value to its owners if data is kept secure, is not allowed to exfiltrate or deteriorate, can be found and presented in meaningful ways.
In the new role of data protectors come the network security specialists, the data governance consultants, the packet shaping hardware manufacturers and a host of others, some listed below. New forms of insurance specialists appear as well, in the shape of the 21st century assurance specialists: the business continuity (BC) and disaster recovery (DR) facility providers.
Data is moving from internal data centers out into the cloud, or at least, partly into the cloud as hybrid data lakes take form. But the public data centers are only public in the sense that anyone can pay to use them. The data remains the carefully guarded property of its owners, albeit held on leased storage space.
The latest breed of data integrity, discovery, processing and assurance specialists, therefore, are evolving solutions that are newly platform agnostic. Data solutions now need to use a software layer of abstraction to hide the complexity of disparate data points from operators.
Data is becoming more complex in what it represents, how it is stored and treated, and what is done with it in today’s commodity traders – modern businesses.
Here at Tech Wire Asia, we’ve considered four data specialists from slightly different areas of data management. If data is your business’s or organization’s lifeblood, read on to find out about the services of each.
Data recovery is in Datto’s DNA. As the topology of modern enterprises has changed (bare-metal to private, hybrid and public clouds), so has its offerings.
The Connecticut-headquartered company operates globally, but wherever there is data, Datto is very much aware of the vital distinction between disaster recovery (DR) and business continuity (BC).
The former implies a process of getting the show (back) on the road, whereas the more impactful BC comprises a series of essential steps which ensure that any losses caused by system downtime (or data unavailability) are at an absolute minimum – and preferably zero.
The company also knows that companies which plan their DR & BC contingencies actually save money even without unforeseen events affecting them. The very act of appraising the “what-if” scenarios required to effectively plan often exposes systemic issues which, when addressed, add business value immediately.
Combatting expensive downtimes (up to US$8,000 a minute for larger organizations), Datto’s SIRIS, ALTO and NAS solutions get businesses back up to full speed, irrespective of where the data may be: in-house, in transit, or in the cloud. Datto’s solutions are tailored to size of business and individual requirements. They can be configured for one-click restores, or set to allow granular combing of snapshots to retrieve and reinstall specific information.
To find out more about Datto’s solutions, read the full profile here.
Billion-dollar turnover Informatica supplies a range of products for creating, managing, securing and utilizing modern data lakes. Data lakes form when disparate data sources are aggregated – brought together either physically, or in a virtual, abstracted sense.
Companies can, for instance, let Informatica systems root out data repositories right across the full gamut of storage and apps, from legacy systems up to recent cloud acquisitions.
Data, the medium of modern commerce, can then be assessed, cleaned, verified and presented by a range of Informatica products, using some of the very latest enterprise-level processes, including intelligent big data parsing methodology, and stream computing.
The problem of an overabundance of data, some of it in some way duplicate, will only ever increase in scale as more devices start to generate and store data: the internet of things will just confuse systems not able to identify, clean, protect and manage the new digital currency.
Business-to-business and B2C data exchange can be managed directly, as Informatica systems comply with data standards, both proprietary and open source. Users of Azure, AWS and the like will be able to interchange data between different platforms and outside bodies with transparency and concrete assurances of safety.
Ranked consistently highly in the Forbes Cloud 100 list, Cloudera operates data and computing services, including training, which are concerned with massive data manipulation and processing.
The company has been involved with the work of the Apache Foundation for many years: its credentials in the open source community are well known and the company contributes significant resources to the various Apache projects which form the basis of its own services, Hadoop & Spark, for instance.
It also offers a couple of distributions of its products on a public license – Cloudera Express, which includes a deployment dashboard suite, Cloudera Manager, and a platform which includes multiple Apache products, including Hadoop, Impala, HBase and several more.
For enterprise users, the company’s offerings are grouped into roughly a dozen paid-for solutions, ranging from data navigation tools to powerful, NoSQL-based database apps.
Cloudera is developing a burgeoning range of enterprise-scale tools to allow businesses to create business value from IoT deployments. Customers use Cloudera Enterprise to process the petabytes of data potentially flowing from complicated, intelligent “things” deployments, from smart buildings and self-driving car sensors, to industrial machinery monitors & attenuators.
Like some of its competitors, Talend’s developers are highly involved in the open source community, contributing significantly to Apache projects which empower the messaging and web application solutions which form Talend’s offerings’ basis.
In business terms, Talend’s products allow personalized messaging services for B2C communications and various data analytic systems which are aimed firmly at marketers. Tools are available to aggregate geolocation and social data, among other rich sources, allowing unparalleled insights into customers’ mindsets.
Talend also offers its own certification frameworks, with accreditation for data integrators available in development and managerial streams.
Depending on the tasks in hand, clients can mix and match between different processing, collation, and management methodologies, based on batch and stream processing. Deployment can be either on-premise or in the cloud. Talend’s Data Fabric generates Java, Spark and/or SQL code which is optimized for big data processing, baked-in.
The Talend integration system fully embraces a range of open source technologies, including MongoDB, Hadoop, and Cassandra – and the company’s involvement in these and similar projects ensures that data systems will not become obsolete.