Data is supreme and has become considerably crucial for enterprises

A data visualization display at the Big Data Expo 2018 in Guiyang, Guizhou province, China. (Photo by Jadranko Marjanovic / Xinhua News Agency)

Big Data tips enterprises should consider

In a modern business landscape, data is supreme and has become considerably crucial for enterprises. Even artificial intelligence (AI) is getting powered by Big Data. The secret lies in the capability to collect, sort through, and collate data from diverse sources. 

This brings in the capability to increase the insight-level and make data-based decisions that enhance business enablement. The leverages extend from marketing, internal workflow to sales for businesses. 

However, there are several things companies should do to build their data enterprises — and just as importantly, mistakes companies should cease or avoid, according to a computer scientist, database research pioneer, and MIT adjunct professor Michael Stonebraker.

Resisting the cloud

One may scoff, but if your organization isn’t planning to become cloud-native, you could be backing losing technology. The cloud is more elastic from a security standpoint than an on-premise solution, and more cost-effective in the long run.

Firms like Amazon offer cloud storage at a fraction of the cost and with better infrastructure, often with tighter security and staff that specialize in cloud management for a living, according to Stonebraker. “They’re deploying servers by the millions; you’re deploying them by the tens of thousands,” he said. “They’re just way further up the cost curve and are offering huge economies of scale.”

Simply put, with cloud a company can use a thousand servers to run end-of-the-month numbers and a scaled-back amount for everyday tasks.

Not embracing rocket scientists

Organizations need new talent, and they need to pay for it. Even more than that, they must embrace the guiding light principle. Organizations that seek out this caliber of employees and are willing to fully embrace them with all their weird obsessions and bizarre knowledge bases will end up with a better return.

HR won’t like what you’re paying, and “they’re not going to wear suits,” Stonebraker said, but don’t drive them away. “They will be your guiding lights.”

Enterprises must avoid real data science problems

It may not be glamorous, but genuinely successful data scientists spend 90% of their time on data discovery, data integration, and data cleaning. Without clean data, major big data initiatives mean nothing.

Companies should get a system in place and stick to it because your rocket scientists, your talent that you’ve spent money, and fought with HR to hire can help lead the way. But the organization needs to solve the real data problem — the quality of data. The best way to address this, he said, is to have a clear strategy for dealing with data cleaning and integration, and to have a chief data officer on staff.

Will traditional data integration solve issues for enterprises?

Traditional data integration isn’t going to cut it in the world of big data. The two most common ones, extract, transform, load (ETL), and master data management (MDM) processes, are too old to work properly and won’t scale. 

Believing data warehouses will solve all problems.

Data warehouses can solve some big data problems — but not all of them. Warehouses don’t work for things like text, images, and video, Stonebraker said. Instead, use data warehouses for what they’re good for such as customer-facing, structured data from a few data sources.

“Get rid of the high-price spread and just remember, always, that your warehouse is going to move to the cloud,” he said.

Succumbing to the “Innovators Dilemma.”

Often, legacy systems have to be abandoned, even if it results in drastic changes or potentially losing customers. It’s a road of constant bets on the future and being able to reinvent the organization. “You simply have to be willing to do that in any high-tech field,” Stonebraker said.

New tools shouldn’t be outsourced, Stonebraker said. Other things should, like maintenance, — and while you’re at it, don’t run your own email system, the professor advises.

Assuming that data lakes will solve everything

Stockbroker suggests companies clean their lake data with a data curation system that will solve these problems. “This problem has been around since I’ve been an adult and it’s getting easier by applying machine learning and modern techniques,” Stonebraker said, but it’s still not easy and companies should put their best staff on the problem. 

“Don’t use your homebrew system,” he said of in-house technology, which is often outdated. Usually, the best data curation systems come from startups, he said.