Will AI Kill the Data Scientist?

Post Views: 124

Long before data science became “a thing,” I trained as a “real” scientist, researching arcane and less-than-world-changing topics in physical chemistry. At the heart of that work was data, collected mostly manually during late-night experimental runs in a 1970s lab. That data I punched onto cards for analysis — simplistic by today’s standards — in FORTRAN programs running to thousands of lines of code. Correctly predicting patterns of physical behavior was the goal. Thus were hypotheses about basic physical phenomena proven with data and code.

Today’s concept of data-driven business springs from this rootstock. Data and code together are expected to answer a plethora of questions about everything from the circumstances in which machines might fail to how customers may respond to a particular marketing approach. The data scientist was born and went on to hold the sexiest job of the 21st century.

One problem with data science and, indeed, the whole concept of data-driven business is, in my opinion, that the scope and characteristics of even the simplest real-life business, economic, and political questions are of completely different orders of complexity than basic science problems, however complex their real-world impact.

One of the most inspiring stories in computing describes the work of the “computers” at NASA from the 1940s to 1960s– women who crunched data and coded the equations of motion to finally land a man on the Moon. The mathematical and computational challenges were enormous despite all the basic Newtonian physics being known. Although the computers got Apollo 11 precisely to the Moon better than any human pilot could have done, it required the skills and expertise of Neil Armstrong to land the Eagle on the surface when confronted with an unexpected boulder field at the planned landing spot.

Human behavior — within and outside business — is strewn with such boulder fields of the unexpected and the unpredictable, few of which adhere to any known mathematical equations.

You may argue that today’s autonomous Eagle, equipped with advanced robotics and artificial intelligence, can negotiate a random boulder field on the Moon or Mars. You may well be correct. However, these situations remain calculable by the laws of classical physics. The boulder fields of heart and mind are not and, I sincerely hope, will never be.

Isaac Asimov’s vision of psychohistory in his Foundation series — where mass behavior can be forecast with complete accuracy but where the action of a single individual cannot be predicted — makes for wonderful fiction. Yet, marketers and advertisers persist with the belief that individuals’ discrete actions and preferences at a point in time are sufficiently predictable that they have invested in a vast surveillance system that tries to track every human activity on- and offline, analyzing the resulting data with machine learning algorithms whose internal workings are already almost beyond human comprehension. []

Scratching beneath the surface of marketing success stories and real achievements, we find a range of bad decisions already emerging. The recent first fatality caused by an autonomous vehicle has been traced back in part to algorithm unable to predict human behavior. In Weapons of Math Destruction, Cathy O’Neill has shown beyond reasonable doubt that current algorithms magnify and establish human biases rather than eradicating them, as some observers would like to believe.

Increasingly, algorithms developed and used in data science today are produced by machine learning systems rather than directly by data scientists. They are, however, still largely decipherable by humans. When they go wrong, we can usually figure out why and fix them. In such cases, the role of the data scientist becomes more of a results checker and interpreter than an innovator.

Of more concern are algorithms that learn from experience and undertake a process of “self-improvement” to generate better algorithms. This approach has already taken over high-frequency trading on the stock market, resulting in regular “mini-flash-crashes” as long ago as 2014. The lesson: given huge data volumes and velocities, simple goals, and a complex ecosystem of competing and evolving algorithms, the stock market is assuming some characteristics of a chaotic system whose behavior could be subject to potentially cataclysmic state changes.

Recent developments in artificial intelligence formalize this competitive approach as “reinforcement learning.” The power of this method has been amply demonstrated by Google DeepMind’s Alpha Zero, a generalized version of the champion-defeating AlphaGo, which has now taught itself how to play (and win) chess and the Japanese game, Shoju, without any training data. According to Demis Hassabis, DeepMind CEO: “It plays in a[n] … almost alien way.”

When confined to game playing, these developments are fascinating. However, the danger arises when they are applied — as they inevitably are or will be — to real-world situations under the auspices of data-driven business, where the goals are simplistic, the rules are less clear-cut and subject to human interpretation or interference, and the possible outcomes either dangerous or discriminatory. Data scientists — at least as the role is currently defined — have the wrong skills and career aspirations to engage successfully in this emerging environment.

AI-supported decision making, therefore, is at a minimum fundamentally changing the role of the data scientist. Science, defined as a systematic activity to acquire knowledge that describes and predicts the natural world, is becoming embedded in the AI systems themselves. Humans see less and less data despite businesses collecting ever greater quantities. The future need is not for data scientists. Rather, we will need “robot overseers” who can intuit more than understand what the algorithms are producing, why they do so, and whether the outcomes are socially and ethically what we actually want.

Upgrade & Secure Your Future with DevOps, SRE, DevSecOps, MLOps!

Related Posts

What is Data Pipelining Tools and that are the Different Types of Data Pipelining Tools?

What are Data Engineering Tools?