How Data Science Will Evolve Over the Next Decade?
Okay, let’s first admit that we’re all living in the century of data now. Machine learning and data analysis technologies have already become an integral part of our present-day life. So, what would be next?
In this post, I am not going to state what kind of future data science will encounter, whether it will be bright, auspicious or unpromising and whatever else. Here I will just take into account experience both my personal and the folks I have met and bring some decisive factors together to predict something.
Irrespective of that, I decided to outline key factors shaping the future of data science within 10 years from now. I hope it will bring you some valuable insights concerning workflow. Needles to say, it’s just my personal predictions. If you are interested, just keep reading!
The Future of Data Science: How Do I See It?
#1 More Data Science Strategies
Data Science is nothing but a quantitative approach to a problem. In the past, due to lack of data and/or processing power, we relied on other things, like ‘authoritarian whim’, ‘expert intuition’ and ‘general consensus’. Today that doesn’t work at all, and, questionless, it will be even less effective 10 years from now. Data scientists, in turn, are building systems that can speak, predict, anticipate and give real results.
The bubble around data science skills isn’t set to burst. On the contrary, the introduction of data-driven strategies will continue to gain prevalence. More people will look at data, gain insights from it, and so it may lead to the use of the data science team as an integral part of any successful organization or, at least most of them. It may even cause competitiveness and desire to be on the top.
#2 More Clearly Defined Roles
So, data science will be more popular and for the majority of customers, it will be more clear what data scientists actually do. Today, a data scientist is a broad catch-all title. People in the industry have used the designations and descriptions a bit loosely. Hence, there is a lot of confusion around who does what.
We typically separate the data roles into 4 distinct but overlapping positions:
- Data Architect — Develops data architecture to effectively capture, integrate, organize, centralize and maintain data.
- Data Analyst — processes and interprets data to get actionable insights for a company.
- Data Scientist — data analysis once data volume and velocity reaches a level requiring sophisticated technical skills.
- Data Engineer — Develops, test and maintain data architectures to keep data accessible and ready for analysis.
I think over time, all of these roles will become more familiar to us, so we will better understand the differences. Therefore, customers will have more realistic expectations of what can and can’t be achieved, and see a clearer picture of the workflow and what benefits they may have.
#3 More Softer Skills in Demand
Over time, it will become clearer that there will be a lot of data scientists who are proficient in Python or R. But, what about the ability to sell ideas to the management, ability to convince that your insight is worth pursuing? Visualization does half the job, but the other half is plain old marketing. Consequently, we may see a shift towards those who know how to place critical conversations around product challenges. So, those who can combine hard skills with soft skills will always be in demand.
#4 More Data, More AI to Handle It
Now let’s talk about serious things. The amount of data we create every day is truly mind-blowing. There are 2.5 quintillion bytes of data created each day at our current pace, but that pace is only accelerating. Just look at some key daily statistics highlighted in the infographic made by Raconteur:
- 500 million tweets are sent
- 294 billion emails are sent
- 4 petabytes of data are created on Facebook
- 4 terabytes of data are created from each connected car
- 65 billion messages are sent on WhatsApp
- 5 billion searches are made
By 2025, it is predicted that 463 exabytes of data will be created each day globally — that’s the equivalent of 212,765,957 DVDs per day!
Realistically, the data scientist alone cannot manage and process these vast volumes of data. Quite possible then that AI will become a valuable tool to assist data scientists in processing this data. Automated tools for statistical analysis and machine learning will become “smart” enough to replace data scientists for routine tasks, such as exploratory data analysis, data cleaning, statistical modeling, and building machine learning models.
#5 Less Code. Much less Code
According to A. Karpathy, director of AI at Tesla, we will no longer write code in the near future. We will just find the data and enter it into machine learning systems. In this scenario, we can take on the role of a software engineer who turns into a “data curator”. Most of tomorrow’s programmers will no longer have complex software repositories and write complex programs. Karpathy said that they will collect, clean, manipulate, label, analyze and visualize the data that the neural networks generate.
Machine learning is ushering in a new paradigm of computing, where training machines is the key skill. As we continue to democratize ML technology and reach higher levels of abstraction with our tools we will see much of our coding drop away. Eventually, the majority of steps in creating products will be drag, drop, swipe, point, and click. This frees practitioners to be more strategic and creative in how they solve problems. Ever see anyone on Star Trek program computers? Exactly.
Does it mean tools like R, Python, and Spark will become irrelevant and most data scientists won’t need to write code to perform statistical analysis or to train machine learning models? Well, I don’t think so. In any case, pin your hopes on this development is having very little value. You will still need to understand and know all the processes, and machine learning will simply facilitate your routine tasks.
#6 APIs Used Whenever Possible
The majority of companies get started by becoming known for doing one thing well and surfacing their contribution to the community as an open-source API. In 10 years, most software will be crafted by visually tapping into endpoints, and leveraging whatever services are needed to create a solution. Data scientists will be able to rapidly construct their model harness, building and testing multiple algorithms in one shot, and visually validating results with the entire team. There will be much less reinventing the wheel, with deep technical thinking brought in at the most opportune time.
#7 Self-Taught Education
The traditional academic setting is making less-and-less sense. The information economy demands access to rapidly changing information. By the time people graduate from a 3–4 year degree their learning is stale. People are starting to empower themselves by taking control of their own education. The institutions that survive will be those who embrace online, rapidly changing course offerings. Learning will also be defined based on what you build, not on fundamentals devoid of real-world application.
Q1. Will Data Scientists Be Replaced with Automated Algorithms?
According to the most popular CRISP-DM data analysis project management methodology, the implementation of data analysis projects includes 6 phases, in each of which the analyst or data scientist is directly involved:
- Business Understanding
- Data Understanding
- Data Preparation
Steps 3 and 4 involve a lot of routine work. To use machine learning to solve specific cases, you must constantly:
- Configure model hyperparameters;
- Try new algorithms;
- Add to the model various representations of the original features (standardization, stabilization of variance, monotone transformations, dimensionality reduction, coding of categorical variables, creation of new features from existing ones, etc.).
These routine operations, as well as part of the operations in preparing and clearing data, analysts or data scientists, can be eliminated with the help of automation. However, all other parts 3, 4 and the remaining steps of CRISP-DM will be preserved, so such a simplification of the daily work of analysts does not pose any threat to this profession.
Machine learning is just one of the tools of a data scientist, besides visualization, a survey of data, statistical and econometric methods. And even in it, full automation is impossible. The high role of the data scientist’s will undoubtedly remain in solving non-standard problems in the development and application of new algorithms and their combinations. An automated algorithm can sort through all the standard combinations and produce some basic solution, which a qualified specialist can take as a basis and further improve. However, in many cases, the results of the automated algorithm will be sufficient without additional improvements, and they can be used directly.
One can hardly expect that a business will be able to use the results of automated machine learning without the help of analysts. In any case, the preparation of data, interpretation of the results and other stages of the above scheme will be required. At the same time, many companies today have analysts who constantly work with data and have an appropriate mindset, are deeply versed in the subject area, but do not possess machine learning methods at the required level.
It is often difficult for a company to attract highly qualified and well-paid machine learning specialists, the demand for which is growing and many times exceeds the supply. The solution here may be to provide access to automated machine learning tools for the company’s analysts. This will be the effect of the democratization of technologies created by automation. In the future, the benefits of big data will be available to many companies without the formation of highly professional teams and the involvement of consulting firms.
Q2. Will Be a Data Engineer More in Demand Than Data Scientists?
I think it’s time to start distinguishing data scientists and data engineers.
The first are applied mathematicians with a serious education, researching the science of data, developing new algorithms, formalizing neural networks, etc.
The latter have a slightly different area of interest, they know the theory and the limits of applicability of each method and successfully solve business problems.
The first will always be something to do, the work of the second, of course, can be partially automated, but never completely. New methods, algorithms, and approaches will always appear. In addition, an expert understanding of the subject area and the nature of the data, understanding the goals of the customer and the ability to quickly achieve these goals, which cannot be done with fully automated methods, will always be extremely important.
Data science is a real thing — but the world is moving to a functional data science world where practitioners can do their own analytics. You need data engineers, more than data scientists, to enable the data pipelines and integrated data structures.
Smart organizations have smart people who should know their own data. The reason data scientists exist is that most organizations aren’t data-savvy yet. They will be.
If a data scientist creates a breakthrough algorithm, and there is no data engineer to put it into production for use by the business, does it have any value?
I will repeat my favorite statistic from Gartner that only 15% of big data projects ever make it into production. And while they never dig into the reasons why 85% of big data projects never make it there, I will propose that there are several key reasons why they fail:
- They never find an insight worth putting into production
- They find insight and build a model but fail to build a production pipeline that can run within the service level agreement on a repeatable basis
- They don’t need an insight, because the data analysis they want to run isn’t dependent on some complicated model, but still, fail to build a production pipeline that can run within the service level agreement on a repeatable basis
That’s why every data scientist companies need at least two data engineers.
Wrapping it up…
The future of the data scientist profession is still vague and subject to expert judgment. However, every day various new libraries and tools appear, we are on the path, by no means, not simplifying the infrastructure, both developing and creating business models. Reliably, soundly, many will say well, but there is a downside, the more complex systems we make, the more stochastic and probabilistic they turn out.
The main problem of the current state of questions about AI is the absence of intuition in a prognostic sense. We have only a quantitative approach to solving a particular problem and making forecasts on it, but not high-quality. Till now this approach has worked very well, but what would be next?