The Line Between Commercial and Industrial Data Science
Source – informationweek.com
It’s no secret that investing in data can lead to major benefits for organizations. Not only is data vital to success, companies that utilize insight-driven practices are twice as likely to be market leaders within their industries. When you combine that perspective with the fact that upward of 80% of all collected data goes unused, the possibilities really begin to present themselves.
But data science practices are vastly different when you compare industrial and commercial organizations. For each, data sets take on different forms, from the frequency of data inputs to the costs of experiments and models to, most critically, the impact associated with the resultant insights such as the cost of failure.
Commercial data scientists: Keeping users engaged
If you’ve ever done your banking or shopping online, then you’ve reaped the benefits of commercial data science. Data scientists in the commercial sector leverage data in order to optimize consumer engagement for online users. It’s even evolved into real-time testing, as opposed to just retrospective testing in order to help organizations better understand end-user decision making and quickly shift tactics to achieve their desired outcomes.
With mounds of accessible data available, the commercial sector is transforming online engagement strategies in advertising, purchasing, shipping, and community feedback. According to Gartner, by the year 2020 more than 40% of data science tasks will be automated, which will increase productivity and usage of data and analytics by citizen data scientists.
We know that on a given day, there could be millions of visits to a company’s a website, creating numerous events for data scientists to optimize using real-time experimentation. At the same time, they typically don’t impede standard traffic and engagement processes while they are being run. In commercial data science, the technology is advanced and the data is so accessible and uniform that automation for some use cases is relatively plausible and could replace many traditional data scientists roles.
Industrial data scientists: The true data wranglers
While the industrial sector is becoming more automated on the plant floor or across the grid with advanced equipment and sensors, data science practices differ significantly from those in the commercial sector today.
Big data helps industrial organizations manage the deluge of information stemming from connected assets and sensors. But the growth of these connected sensors and connected equipment, especially when combined with many of the legacy systems that have their own critical data elements, has created quite an abundance of messy, unstructured data. While data has become essential in measuring risk, improving asset performance and reducing costs, it requires much more hands-on management and analysis than that of the commercial sector due in large part to the work needed to properly prepare and make sense of the data.
An important differentiator for industrial data science is that it relies on a time series signal from machines. Engineers are accustomed to working with successive measurements made over a specific time interval, which restricts data scientists’ abilities to conduct experiments. With significantly less data than that produced in the commercial sector, industrial data scientists and reliability engineers must conduct close control experimentation, often after equipment failures occur.
If a machine breaks down only once a year, for example, there will be a very small sample of data available to identify cause and solution. As a result, data scientists and engineers have to work together to create physical models in combination with data sets to train machines to predict failures in the future. Raw machine signals aren’t enough without the physical model combined to derive value from data science.
To fully embrace data in asset-intensive industries, organizations require unique experience, engineering input, and machine knowledge. Industrial data scientists must resolve data entry issues to achieve high-quality analytics and avoid a traumatic asset failure, so they need unique skill sets beyond statistics to effectively create new strategies for machine maintenance and optimization. They also need to be relatively well versed in three distinct but inter-related data science pillars: Physics based (heavy domain expertise needed); digital (using past events to better predict future outcomes); and emperical (operator/expert knowledge). This last element is a vital but often overlooked component in successful outcome realization. In industrial settings the difference between data relationships that “could” correlate and data that is actually causual can be extremely subtle. Those key to unlock those subtleties are often found in minds of the critical use communities
As you see, data scientists wear many hats: they must of course be talented with the numbers, but they also need to be able to contextualize the data they’re analyzing. Taking qualitative phenomenon and quantifying it in a meaningful way is an art. However, virtually everything can be modeled into a mathematical story, and having the ability to look at data sets and develop strategic insights from a business mindset is what makes data scientist so valuable.
These days, both the commercial and industrial sectors have more advanced software and computing capabilities at their fingertips. While they have their own intricacies, increased automation and progressive techniques are transforming each from reactive to proactive strategies. In the end, this is what will not only impact the bottom line but provide the means to true outcome realization and business transformation.