Big Data Is Dead: Long Live Smart Data
Source – https://www.forbes.com/
Campbell Brown is CEO & Co-Founder of PredictHQ – Demand Intelligence for a dynamic world. Kiwi, family man, lover of travel, data & APIs.
For more than a decade, big data has been steadily soaring. New data-driven companies have emerged and become multibillion-dollar juggernauts, while established market leaders recognized the power of data early and have invested accordingly. But like with so many things, 2020 was a wake-up call for data strategies, especially the many not delivering immediate value.
I think it’s time that we accept the limitations of big data and embrace the need for smart data. The shift toward smart data has been going on for at least a decade. The central idea behind my own company is to equip companies with the smart data they need to improve their demand forecasting.
While accurate external data is one of the few factors that can bring certainty to your planning, many data scientists report spending around 80% of their time cleansing, verifying and preparing data. This new era of smart data—data that is already cleaned, verified, featurized and able to be plugged into a model and start delivering value swiftly—is rich in possibility. Smart data is configured to enable models to find and use the most impactful data faster, so it learns how the world works and makes better decisions. The companies that make the most of smart data will likely be those that shake off the big-data ways of thinking ASAP.
How Did We Get To Drowning In Big Data?
For decades, companies have poured money into data sources and pumped more information into their data lakes. But new data does not always equal new intelligence—to machine learning models and especially to core business strategies—and slowly the weight of all this data has built up.
Artificial Intelligence In 2021: Five Trends You May (or May Not) ExpectReimagining Digital Governance With Artificial Intelligence And IoTSweden’s Former Minister Of Finance: AI Is Vital To Avoid ‘Perfect Storm’ Of Unemployment
Because most data sources need cleansing and standardizing, expensive employees wielding multiple postgraduate degrees have often found themselves spending most of the week tidying up data. This coupled with the complexities of producing useful machine learning models when nearly 90% of data science projects never enter productionmeans that a lot of data science investment hasn’t amounted to big wins for the bottom line.
Enter the pandemic. As finance teams ran the ruler over everything, any nice-to-haves had to go. The right data projects are make-or-break, but anything that couldn’t be used to improve core functions such as demand forecasting, pricing or driving competitive advantage couldn’t be justified. The more-data-is-better era is over—it is time for a new era of smart data.
Aggregating And Using Data At Scale Is Only Half The Answer
The most important part is extracting value. How do you make your data work for your goals, right now? The analysis paralysis of the last 10 years has to end.
Big-data strategies have been trying to boil the ocean for too long. These maximalist approaches don’t work because they’re way too hard. They never did, and they definitely don’t when you have fewer team members, tighter margins and your demand forecasts rely on last year’s data. From what I’ve seen, the best businesses now are focusing less on the depth and breadth of their data lake and more on getting the most value out of it.
It’s time to flip our data strategies from paralyzing to enabling—to take that asset and turn it into something you can get value from. Right now. Set your data scientists free to do the work they dream about: not collecting, aggregating and cleaning, but building models to tap into signals over noise for core processes such as labor optimization and price forecasting.
McKinsey’s chief data officer and one of the company’s partners put it well in an article from February: “Many companies have made great strides in collecting and utilizing data from their own activities. So far, though, comparatively few have realized the full potential of linking internal data with data provided by third parties, vendors, or public data sources. Overlooking such external data is a missed opportunity. … The COVID-19 crisis provides an example of just how relevant external data can be. In a few short months, consumer purchasing habits, activities, and digital behavior changed dramatically, making preexisting consumer research, forecasts, and predictive models obsolete. Moreover, as organizations scrambled to understand these changing patterns, they discovered little of use in their internal data. Meanwhile, a wealth of external data could — and still can — help organizations plan and respond at a granular level.”
As you look forward, you can use data to inform dynamic decision making and make models more accurate, providing certainty in the Covid chaos. You can use data to enable smart decisions about efficiencies and opportunities hiding in plain sight, and you can do that immediately rather than after three months of a team of five data scientists working on it. This is the power of smart data.
What Defines Smart Data?
As companies learn to do more with less, automation and machine learning become critically important. Smart data starts with reliable and verified data, but it’s more than just a record of truth. It needs to be enriched, contextualized and featurized so that it’s no longer simply raw information (however high-quality). This reduces the friction and error-prone nature of feeding new data into machine learning models.
As you are assessing every new data source, you should be asking of it:
• Does this data provide adequate depth to provide enough context for the issues I am building models to solve?
• How verified and accurate is the data?
• How frequently is it updated and re-verified as we head into a long-awaited but chaotic recovery?
• Is it enriched enough that my models will know what to do with its input?
• Which core business decisions will this enable my machine learning models to make better and faster?
• Can I easily explain what this data source is showing, and its impact on my forecasts and operations?
Last year left many companies flat-footed by the ultimate black swan event. Many are now working hard to integrate new data and roll out data-driven recovery plans to know what will drive their demand. Whether it’s a televised sports game or festival driving up demand, a flood or terrorist attack driving down demand, or clusters of smaller events bringing people at a scale unforeseen by most companies, many companies are on it. And I think every company needs to be.