Moving data science from the lab to the boardroom: three next-gen BI vendors
There’s an old piece of advice for anyone approaching a building project that uses woodworking: measure twice, cut once.
While that type of care and attention might not be mandatory when working with digital information, the proportion of time spent in the preparation before the final ‘cut’ is probably about right. Any data scientist knows that a decent percentage of their daily working lives is spent in normalizing and preparing data. And an analyst working in a large organization will also add that finding relevant data also adds to time being spent before analysis can take place.
Putting aside for the moment some of the specific challenges of finding information (more on that later), the need for clean data comes down to computers’ nature. Our silicon helpers are immensely fast and never tire, but as anyone who’s ever written a line of code knows, a single misplaced period or semicolon can render hours of work useless, and a five thousand dollar workstation into an expensive paperweight!
Getting the right data, and normalizing it are the mainstays of most analysts’ professional activities. But as the amount of data available to the enterprise grows, the standard 80:20 split of time & resources that’s typical of most analysts’ work becomes less tenable. And as more sources become available (partially because of better API integration in new digital platforms), most data analysis professionals will be more aware than ever of the 80:20 ratio and what it means to their department’s bottom line figures.
Locating at scale
In large businesses and organizations with multiple stakeholders often on different sites (or even continents), there’s an increased tendency for work to be replicated. For analysts that can be particularly irksome and expensive. But without a cohesive and collaborative data platform that’s used right across the board, that situation is an unfortunate reality. Furthermore, and further down the chronological steps of typical data analysis and presentation, the same elements may well have been located, cleaned, processed, modelled and even presented already, perhaps several times.
Creating catalogs of data is clearly important, but equally as relevant is keeping track of the tools and methods used, the modelling techniques laid out, and even the individual R or python scripts logged. The three vendors of data analysis platforms we highlight below are all, it is fair to say, capable of providing such a function, in varying mixes which will appeal in different ways according to individual use cases.
Similarly, each platform included below is capable of assimilating and deploying archive data and live stream data from the multiple sources in daily use by enterprise systems: from on-premise applications and services to substantial cloud deployments, to individual Excel sheets, to IIoT devices streaming information over new 5G networks.
Building models at scale
The range of tools that different analysis teams might use are as diverse as the people that make them up. But in an increasingly digitized organisation, stakeholders whose job title certainly doesn’t contain the word “Analyst” want access to self-service portals from modern data solutions to visualize and model highly-specific data sets. That type of “citizen analysis” is massively important to the strategic growth of the organization, because it allows business function specialists to access critical information as quickly as possible, without massive preparation times.
Mapping workflows and tool utilization are therefore highly relevant, whether that’s in predictive analytics, spatial analysis using third-party tools via APIs, bespoke Python routines, or even machine learning toolsets developed for specific uses.
The key here is repeatability, and the need for a collaborative and open source of information as to how data is sourced, cleaned, processed, and presented. In the same way that IT professionals have found their role morph into becoming strategic enablers for the larger enterprise, data scientists at all levels are empowerers: giving the business functions the information they need to make decisions based on empirical information. Insight-givers, in other words.
Show & share
To return to our woodworking example, the ingenious tenon joints and sub-millimetre accuracy of the routing and fretwork that might be hidden away inside a classical instrument or piece of beautiful furniture, is secondary to its utility. What the end-user needs, and has commissioned, potentially, is a musical instrument or a utilitarian and beautiful piece of furniture. That’s of much higher concern than the appreciation of the artisan’s handiwork that goes into the finished object. In the same way, the presentation of statistics, information and data, in general, is absolutely integral to finding those insights.
In the data analysis world, presentation of data findings in lines-of-business is rightly thought to be best achieved with the input of those line-of-business stakeholders — this is why self-service of data analysis is so powerful; the results are always arrived at with a specific purpose in mind.
Therefore, the catalog of methods and data also needs to contain the models, too. This can be informed by feedback from line-of-business managers, and that way, data scientists are never left in a vacuum, one where there is no guidance for the data team’s work.
The platforms we feature below hit these targets: business-focused data analysis solutions that are capable of sourcing, normalizing, processing, and presenting data, but at all stages keeping data-sharing and collaboration strictly in mind. It’s by the meticulous cataloguing of, and collaboration on the data and analysis methods that creates the type of insights for the enterprise that differentiate the organization from its peers and competitors.
There are several aspects to the Alteryx platform that make it pretty special in this space. For data analysis professionals at any level, there’s a massive amount of thought gone into integration with the tools and platforms already at hand and in daily use. For the more business-focused professionals, that integrative approach extends to the rest of the technology stack that’s usually in play at enterprise level, including cloud providers via application programming interfaces (APIs) as well as common tools found all over the various business functions, including large enterprise resource planning systems (ERPs).
The platform is big on collaboration, making searching for data and analysis assets quicker, and this open, community basis means there’s little chance of different teams replicating one another’s work. Data sets can be prepped and blended by those happy with writing their own R, or those who are entirely code-averse.
The collaborative approach to big data in Alteryx is very much suited to the enterprise setting: This is data analytics for big organizations, not for isolated data science teams operating in ivory towers! You can read more here on the pages of Tech Wire Asia — just click the link.
The Tableau offerings are designed with different user-types in mind. Whether your C-Suite want to approach a problem from a macro-perspective, or have an individual technically-minded data scientist write bespoke ML routines on live data streaming from multiple sources, the platform has it covered.
This is achieved by defining various roles in a company: viewers, explorers, and creators. That series of roles may exist in a single team, or may be spread across business functions.
For viewers, there’s the visualization and oversight they need; explorers are more confident moving deeper into the layers of information; creators are the people with “analyst” in their job title.
Tableau can be deployed at massive scale in cloud-based clusters or in private data centres, and the company has split its platform’s availability into four apps: Desktop, Prep, Online & Tableau Server. Learn more about the gold standard Tableau here.
Qlik brings a DataOps approach to the entire enterprise, making the discovery, collation, preparation, processing and presentation of data the strategic driver for digitally-focused businesses, right across the world.
The Qlik Data Integration Platform is the complete solution, with various specialist product lines covering off SAP integration, data lake creation, data automation and cataloging, amongst others.
Whether it’s for use by data-parsing specialists, or available via intuitive self-service, the information locked away in a business is now ready to provide significant value.
Qlik’s Data Warehouse solutions form a coherent picture that’s of significant practical business value for enterprises, many of whom are struggling to get more than a glimpse onto the resources they have.
Incidentally, the annual QlikWorld event this year is, perhaps unsurprisingly, an online-only event. Registration is free for the two-day shindig, where you can hear Qlik’s executive team talk about the value they’re planning to bring to the big data analysis world over the next twelve months.
*Some of the companies featured on this article are commercial partners of Tech Wire Asia
- Singtel a paragon for 5G in Singapore
- China, India are poised to lead the global data center growth in APAC
- BlackBerry software embedded in over 215 million vehicles
- Chip shortage: The lack of “chips to make chips” is exacerbating the shortage by another 2 years
- eTail Asia 2022: Here’s what went down at Asia’s largest e-retail summit