WHY MAJORITY OF DATA SCIENCE PROJECTS NEVER MAKE IT TO PRODUCTION
Today most large companies are looking at the potential of AI/ML, and despite significant investments, hiring data scientists and investing time and money, data science fails to take things to the next level.
One of the biggest challenges present in AI/ML is that a large majority of models are not deployed in production. A lot of people in the enterprises have realised that typically when you have any kind of machine learning or data science work, it goes from a few weeks to develop the model, takes far longer when we talk about placing the developed models into production, maybe more than a year till the model is put into production.
The production takes a long time compared to the development of an ML model. Sometimes when you start rearchitecting the whole ML pipeline keeping deployment in mind, the entire work can go in vain. Deployment pipelines, deployment assumptions and deployment way of doing modelling is quite different. Is data science enterprise-ready?
In a Gartner’s survey of more than 3000 AI aware C-level executives, only 20% reported having AI production, and 80% said they are developing, experimenting and contemplating the use of AI. In another report by Mckinsey, the firm found that out of 160 reviewed AI use cases, 88% did not progress beyond the experimental stage.
As the market for AI technologies and techniques matures and grows, companies need more and better access to innovative AI models, applications and platforms. Unless things are in production, there is no return on investment.
“Technology innovation leaders are keen to apply DevOps principles for AI/ML projects, but they often struggle with architecting a solution for end-to-end automation pipelines across data preparation, model building, deployment and production because of lack of process and tooling know-how,” says Gartner.
The management across several companies may not be fit to learn or comprehend data science. You may have the best model in the world, but if the management doesn’t realise its value, it is probably not going into production. A lot of these times, business intelligence and software stack offer clearer value to an organisation than complex data science systems. With the high expenses of developing AI projects, many organisations are reluctant to invest in the required staff and software to deliver on the promise of AI.
A lot of the times in data science, models do not survive the PoC stage and get dumped due to various challenges, which boils down to a lack of fundamental data literacy at senior levels that leads to data science getting ignored often.
For the majority part, the reason why models are not deployed comes down to resources is that technology is new, and most IT-led companies are merely unfamiliar with the tools and specialised hardware needed to deploy data science models successfully.
One of the essential things in data science is choosing the right problem and chasing the right solution. But, due to complicated technical details, people get caught up on and find themselves a year later having added zero value. Often in data science, projects end up being more complicated in comparison to the business value they are meant to produce.
Data Collection Issues
According to experts like Bill Inmon, the vast majority of data scientists spend most of their time as data collectors, consolidating disparate data sources together, and formatting and cleaning data. Data sourcing, understanding, organising, cleaning are the most difficult part of most AI projects.
Most organisations have highly siloed data which makes it very difficult to put a model in production. Not just data, ML pipelines also take place in isolation and not in a connected manner. This leads to a lack of collaboration among the team members.
Collection of the required data is a challenging task. Data always exists in different formats, structured and unstructured, video files, text, and images, stored in various places with unique security and privacy issues, which makes implementing AI challenging, because the data needs to be consolidated and cleaned. Unstructured data or unformatted data which may take most of the time for data cleaning and can be a reason for losing motivation. Insufficient data which is available for the analysis can also be a factor for failed AI projects.
Incompatibility With Enterprise Systems
Data scientists use languages like Python that may not be compatible with the programming languages used in production systems. To make the model work with the existing systems, it takes a lot of time before the model is recoded, fully retested and tested before deployment. This process may take months and by the time the model is set for production, it can become unnecessary.
If a data science team deployed a model in production, it might need them to work with an engineer to implement it in Java or some other programming language to make it work for the enterprise. Now, this needs constant iterative effort as the model can become useless otherwise with the addition of new data.