
Data is electronic gold: here’s how to drill for it, refine it & sell it
BEFORE the development of the internet and the widespread use of databases, information held on the population’s individuals was scant: maybe just name, date of birth, profession and address. Marketing-oriented information was also very limited – choice of daily newspaper may have been the extent of it, or some derivation of income according to job or location.
Things change. Generations Y & Z have grown up in a digital landscape, and their online activities have built a highly detailed picture of who they are, what they do and the type of decisions they’re likely to make. Even older readers of this site who managed to exist before apps & the internet pour information about themselves out into the data ecosystem.
Data on all of us comes in two categories:
- Behavioral data: social media activities, internet browsing history, interactions with institutions and companies (such as emails), posts to websites & forums, online reviews, etc.
- Transactional data: purchases made online or in-store involving electronic transactions, making insurance claims, applying for credit, buying/selling investments, taking out mortgages, etc.
When aggregated, the total data set represents a pretty good description of who we are, what we’re worth, whom we interact with. From that picture, predictions can be made as to future behavior: some simple (when home insurance renewal dates fall, for instance), some more complex (which home insurance provider will be most attractive).
BEGINNING WITH HOUSEKEEPING
Every organization sits on a veritable goldmine of information – data is the new oil. But unlike oil, which tends to be literally pooled into discrete areas, data, especially in larger organizations, is spread out. Additionally, it comes in different forms, is accessed in different ways and is available to separate subsets of people and systems.
A process of internal data discovery for the enterprise is, therefore, more akin to the controversial fracking process used to unlock hidden shale gas, rather than drilling an oil well to access an underground lake of the black gold. Gathering the resource together, and making it usable, is a serious undertaking.
For the modern business to either function more efficiently, make more profit, or monetize the data it holds, the process of data discovery is the first step, and a tricky one.
Assessment requires technical know-how – ask any database administrator – and involves the pulling together of databases in different departments, divisions, and even from continents. Then there are the processes of data cleaning and amalgamation to be undertaken – none of this is particularly easy or quick, but there are companies which can help with this process; see one of the four listed below.
Even during the data aggregation and discovery exercise, organizations will discover both gaps in their knowledge, and opportunities which the data offers. The opportunities are valuable internally (to bring about improvements in efficiency and bottom-line improvements) and externally – data of value to other companies, or to other organizations or branches (for instance, different government departments).
Internal Data Sharing
In larger enterprises, access to data is a careful balancing act. Staff need to mold and access, edit and delete data as part of their work, but giving people privileges too extensive can be problematic. Trial and error is ongoing!
The framework used to share data is therefore important. Not only will any system need to be usable (if it’s not simple/attractive/powerful enough, it’ll languish), but it will be powerful.
Carefully cleaned data should not be duplicated to anonymize it, for instance. Rather, the framework needs to be able to present specific subsets of data according to its users’ privileges – in real time.
In fact, in large financial institutions, different departments, especially those with particularly stringent governance requirements, may be unwilling to pass data to colleagues – unless the platform is proven as secure.
External data sharing
For those organizations who wish to share data with external partners, perhaps in a reciprocal arrangement, the benefits can include significant productivity improvements as well as improved outcomes for customers or service users.
Government departments historically gather the same data types on their citizens, and the duplication of data is not only an open door to inaccurate data records but also is inconvenient for its subjects.
In the private sector, the applications of data sharing are extensive. A few prime examples might be:
- fraud prevention by pooling data which will show patterns, such as customers who make repeated insurance claims or credit requests.
- real-time shipping data coordinated with retailers to improve customer satisfaction for both parties.
- flight or travel data shared with financial companies to ensure customers’ bank cards are not canceled due to their use overseas.
Monetizing data
Like any product, quality data is a commodity which can be monetized, albeit within the limits of local, national or continent-wide laws (GDPR and the Privacy Act, to name but two).
Determining the value of data for exchange is a complex subject – one with which the featured suppliers below may be able to help – and will not be covered in detail here. But like some of the precepts and caveats of internal data sharing, both the validity (or accuracy) of the data and the framework on which it is available can make or break the deal between those with data to sell and their prospective buyers.
Any exchange framework needs to be secure, but it also needs to be adaptable. No two usages of a provided dataset are ever the same, and therefore different customers for data will need to be equipped with data exchange mechanisms subtly different from others’.
Therefore, simply advertising that your organization has data for sale is usually not enough. The specific terms of exchange, the detail of each agreement, as well as the means of interchange, all need to be established. Data is rarely placed on the market as a flat file – it needs to remain dynamic and available for multiple contracts’ duration.
Compliance
Australian companies need to abide by the strictures of the Privacy Act. Companies dealing with a European customer (even just the one) need to respect GDPR. And no organization wants to be the subject of public horror if data is released to hackers.
Determining what type of data is release-able, and on whom & what, are matters for careful consideration, and most organizations will require specialist help on the local requirements for data governance.
Trails of responsibility are complex and will vary from territory to territory – establishing an audit trail, and careful tracking of data is incredibly important. Once data leaves the hallowed confines of an organization’s data repositories, its ownership details are not as simple as many imagine. Where does the responsibility trail stop, for instance? The collector of the data, the reseller, the re-user? Questions like this need to be answered by experts as getting it wrong isn’t an option.
At Tech Wire Asia, we’ve put together four suppliers of data management services which we think should be considered. The new black gold of data isn’t easy to drill for, nor refine nor sell – but the rewards are there. The following companies should be able to help:
DATA REPUBLIC
In Data Republic’s native Australia, the company has gained an enviable position as the leading data exchange platform provider – trusted by major banks, airlines, retailers, and governments to provide secure, privacy compliant technology to help organizations manage data sharing.
Underpinned by a comprehensive legal framework and with unique privacy protections in place, the whole point of Data Republic’s Senate Platform is to make it simpler and more secure for organizations to license and share data.
Data Republic’s Senate Platform acts like a ‘data sharing control center’ for organizations who want to keep track of the data flowing in and out of their organization. Companies, whose data is stored in secure environments, can customize dataset visibility and approval workflows for internal vs. external parties, govern data requests from the platform’s internal marketplace and control the licensing terms around how data is provisioned and for what purpose.
The Senate Platform delivers obvious upsides for data stewards who want to retain visibility and control over sensitive data sharing across their organization, as well as commercial benefits for Chief Data Officers tasked with balancing data monetization strategy and risk. But the biggest impact is for analysts and insight-seekers who finally have a secure channel and means of accessing and analyzing proprietary, second-party data sets.
Data Republic’s fast-growing data exchange ecosystem is expanding into Singapore and USA; companies can join for one-off insight projects or license the technology annually to fuel broader data governance, monetization or data innovation programs.
To find out more about Data Republic, read the full profile here.
ALATION
By connecting to all of your data sources and BI intelligence systems, the Alation Data Catalog provides a single source of reference for all of the data assets in an enterprise.
Alation makes it easy to find the data you need and get the answers you trust. A searchable catalog of assets (tables, schemas, queries) is created automatically in real time. A smart query tool makes proactive recommendations as you write your queries and can be used by business users who aren’t proficient with SQL.
In addition, a combination of machine learning and human collaboration makes it possible to track and monitor how data is being used, providing insights into the relative value of that data.
One unique feature is the ability for any user, regardless of technical experience, to rank data, providing a level of grassroots governance that Alation calls “Governance for Insight”. The ability to include business rules in a catalog means that everyone in your organization that touches data will be literally on the same page. For example, your sales and marketing teams can share the same definition of ‘revenue’.
Alation provides a solid foundation for self-service analytics, business intelligence (BI) and visualization that immediately surfaces any problems that may be occurring in your data pipeline, both within and across various applications.
The Alation Data Catalog can be used either on-premises or in the cloud and it works with data lake providers such as Teradata, Kylo and HortonWorks.
PAXATA
Paxata allows its users to intelligently transform raw data into ready information. It recently announced Spring ’18, the latest release of what it terms its Adaptive Information Platform. The latest iteration increases data processing speeds and increases the quality of information destined for analysis and collaboration. Other new enhancements include one-click profiling, rapid data onboarding, and multi-tenancy capabilities.
The AIP’s RESTful APIs also allow the ability to use Spark SQL-based data input, which opens the platform for use in an increased range of applications – legacy platform integration is therefore viable.
Single-click profiling creates a detailed data quality “scorecard” in the data aggregation & cleaning phases, detecting potential fuzzy matches, identifying data types, and pinpointing patterns in character strings.
As any IT professional is aware, just the amounts of data (never mind its richness) flowing on networks are due to rise exponentially in the next few years, due to the internet of things (IoT). Paxata is designed for massive data sets such as those found in large IoT deployments, and huge transaction volumes can be distributed across clusters.
The solution slots neatly into LDAP or SAML authentication & privilege systems, so there is no need for account duplication across large enterprises.
QUBOLE
The Santa Clara-headquartered Qubole came out of Facebook’s engineering teams, and the company has, since 2011, concentrated on providing the best ways to process big data ever since.
Qubole’s open-source underpinnings (Spark, Hadoop, Presto) produce an agnostic, cloud-based platform at home in use on Azure, AWS, and Google Cloud apps and databases, and the platform is available at a range of pricing options suitable for smaller organizations as well as large, globe-straddling multinationals.
The company’s autonomous data platform (ADP) provides a trio of information which it acronymizes as AIR – alerts, insights & recommendations.
While alerts and monitoring will be familiar to anyone who’s ever started a server daemon, recommendations are driven by a proprietary artificial intelligence (AI) engine, which aids in data formatting (columnar/compressed, etc.) and modeling, as well as auto-suggesting inputs to users.
The attraction of Qubole’s big-data-as-a-service is that it is always online, is self-healing and takes away from companies the need to monitor and improve their big data infrastructures. Rather, by taking out the day to day running of the modern data lake, Qubole empowers its clients to concentrate on the effects high-quality data can offer.
*Some of the companies featured in this article are commercial partners of Tech Wire Asia