Dark data is the resource goldmine you don’t dig into
Imagine paying US$100 for 16k of RAM. That’s not a misprint, by the way: 16,384 bytes of eight-bit data, making up a reasonably short, fifteen paragraph document, comprising of ASCII characters.
If that sort of cost were a reality today (as it was up to the mid-1980s), then the strictures we’d impose on ourselves and our businesses with regards data would be infinitely more stringent: determining real value, prioritizing what to keep and valuing every bit of every byte.
Clearly in today’s silicon-cheap reality, we can literally afford to be data-rich. The benefits are all around us; from machine-learning’s use in facial recognition in security systems to advanced, pro-active analytics and business intelligence, we create and retain more data than we know what to do with.
Some estimates put so-called secondary data as making up 60 to 80 percent of total data held by the average enterprise. Secondary data used only to refer to backups and failover copies of data, but since data’s presence became all-pervasive, this data “lake” also includes instances like test data, copies of databases used in development operations, as well as hard drive and tape archives.
Use of the term lake in the context of secondary data suggests a single repository of information, from which interested parties can easily draw, for instance, insightful business intelligence, or a copy of a failed cluster. The truth, unfortunately, is somewhat more distributed, if not actually fragmented.
Most enterprises’ data is spread right across multiple silos, from small shared facilities in individual departments, up to massive off-site backup or archiving facilities. These repositories of secondary data are growing at an incredible rate – by up to 800 percent in the next three to five years, according to Gartner.
Modern storage technologies allow massive scale-out of storage and archiving; and of course, the data gathering processes are becoming more and more automated: information is now often retained by default, a situation which is possible due to the ubiquity of storage media, and their low cost overall.
As is the tendency of the tech industry, a catchphrase developed: big data. Just a few years ago, big data processing and research was the remit of high-end research groups in academe and large enterprises. However, even small organizations today hold terabytes of data – sometimes for compliance reasons, sometimes in case of the need to restore or rollback. Often too, data is just “parked,” rarely accessed or used, but available should the need arise.
As resources become available, the so-called primary data and applications are given the prime storage and compute facilities. New-generation NAND drives in hyperconverged data centers ensure that business-critical data and processes are (rightly) prioritized. And because secondary apps and data (literally) come second in the priorities of the IT function, they become distributed, and often duplicated, or silo-ed.
By deploying proper management systems for this secondary resource of so-called “dark data” correctly, businesses will gain in several ways. Among them:
- Capability to cope with the future. Scale-out storage means adding to data storage is seamless, cost- and time-effective. There’s likely to be no reversal in the rate that data is being kept – therefore prices of secondary data look set to rise.
- Lowering costs. Silo-ed data is difficult to access and process to derive useful business value. Additionally, data is often not de-duped, compressed or rationalized intelligently. Therefore organizations are paying more for their archives than they should.
- Derivation of value. Imagine a horrendously untidy home study, where documents are simply piled up. Data is kept, but sifting through for the required information is near-impossible. While the “archive” exists, it provides little use in practical terms. With adequately managed secondary data and applications, the enterprise can use all its data to glean business intelligence: trend analysis, marketing insights, product refinement pointers – the list is long
- Avoiding the lawyers. Data security is big news and is increasingly the subject of legislation and industry stricture. Secondary data is often held insecurely (it’s seen as low-priority) yet represents as much of a security issue as its more-carefully managed primary cousin. Public relations disasters or compliance infringements are blind to storage type or medium – primary or secondary data are often as incendiary as one another in the wrong hands.
The latest generation of secondary data and application management systems ensures that companies can prioritize their data needs properly. Consolidation of data and reducing complexity, while providing a secure environment is the first step.
Massively reduced storage and management costs (savings which will increase as data storage demands rise) mean businesses will be future-proofing their storage, archiving and management; at a cost which justifies the retention of more & more data.
Once properly managed, the enterprise can begin to use the entirety of its data resources to real effect. Any database administrator knows the power of indexing even a single table. The value of even “route one” management methods like indexing is that data retrieval and use is simple and configurable.
Finally, it’s worth noting that any overarching secondary data management system needs to be integrated into existing systems, via APIs or other connectivity methods. Next-gen solutions in this area do not create new archives in proprietary formats. Rather, they provide a powerful homogenous platform where all the organization’s data silos can be overseen.
Here at Tech Wire Asia, we’ve listed four providers of data and application management systems we consider to be worth evaluating to bring your “dark data” into the light.
Cohesity is redefining enterprise infrastructure for secondary data and applications with one software-defined, hyperconverged solution for backups, files, objects, test/dev, and analytics.
Global 2000 companies and federal agencies are modernizing and unifying their non-mission critical global secondary data and applications on a single web-scale solution and eliminate unnecessary data silos.
As a modern, cloud-first architecture, Cohesity DataPlatform furthers enterprise objectives to leverage the elasticity and economic efficiencies of public cloud while staying in control. Cohesity includes native integration with the most popular public clouds, including Amazon Web Services (AWS), Microsoft Azure, and Google Cloud, for multiple use cases.
Inspired by web-scale architectures, DataPlatform is a scale-out solution based on a unique distributed file system, SpanFS. Cohesity DataPlatform stops mass data fragmentation and spans silos by converging secondary data on a single platform.
Because DataPlatform is a software solution it works efficiently on-premises on qualified Cisco, HPE or Cohesity hyperconverged secondary storage appliances as well as in the public cloud.
To learn more about Cohesity, read the full article here.
From its beginnings as the go-to supplier of networking hardware, Cisco is now a supplier of fully hyper-converged data center technologies, ranging from software to hardware management systems.
It is one of the few suppliers that can provide everything the enterprise needs for a full implementation of Hyperconvergence — the abstraction of traditional data center hardware and software.
Where Cisco differs from Cohesity (see above) is that the Californian giant’s solutions span most areas of the modern data-oriented organization – if you have a problem, it can find a solution. Its approach is therefore broad-brush, rather than specialism.
Today’s Cisco offers simple agility and scalability, with monitoring and control into a variety of systems (not just its own) via APIs, or via tools like HyperFlex Connect.
The companies HX Data Platform is a scale-out file system designed specifically for hyper-converged environments, intended for the most demanding of applications and data retrieval purposes.
It includes built-in data replication (for data integrity), de-duplication on-the-fly, in line compression, cloning, and snapshots — this is the belt and braces approach to data security that has made Cisco a world player.
Find out more about Cisco here.
Hewlett-Packard Enterprise is a world leader in its provision of enterprise-scale data infrastructure; nearly all the Fortune 500 manufacturing companies, for instance, use its products.
Its storage and hyper-converged solutions offer high levels of flexibility which were not possible just a few years ago when installations were limited due to physical constraints – EOL meant new hardware.
Hewlett-Packard’s latest offerings provide a software-based provision for IT management, ensuring scalable network and data resource management facilities.
The company promotes its products as ITaaS (IT-as-a-service), stating that the speed of provisioning it can achieve is suitable for even the most variable workload. Overall control and management remain centralized despite changing demands, and this flexible, as-a-service provision maintains the company’s position as a strategic player.
In the secondary data or backup space, Hewlett-Packard’s hyperconvergence technology improves recovery point objectives (RPOs), reducing backup and disaster recovery time to mere seconds.
HPE’s solutions lower the ratio between logical data and physical storage to reduce running and capital costs for organizations of any size.
Find out more about HPE’s solutions at the following link.
One of the co-founders of Nutanix and now runs Cohesity (see above), and the former goes from strength to strength, in line with enterprises’ rush to the cloud.
With many large enterprises today developing multi-clouds (or hybrid cloud) provisions, Nutanix addresses problems of distribution of data over cloud and bare-metal systems.
The company also offers solutions for edge computing, internet of things management, and overall data collection and processing.
Nutanix encourages its clients to undertake hyperconvergence projects as a testbed for the enterprise at large, as its solutions can be scaled very rapidly and simply. Its solution unites clouds right across disparate geographies and provides a central management console which creates an overarching structure of what might be a complex deployment.
Nutanix paved the way with hyper-converged infrastructures, an industry trend which many other more well-known suppliers have followed after the fact — Hewlett-Packard and Dell among them.
With increasing amounts of data powering organizations, many are looking to the originators of infrastructure abstraction to provide fully scalable data management systems.
To find out more about Nutanix, click the link here.
*Some of the companies featured on this editorial are commercial partners of Tech Wire Asia