Upgrade & Secure Your Future with DevOps, SRE, DevSecOps, MLOps!

We spend hours on Instagram and YouTube and waste money on coffee and fast food, but won’t spend 30 minutes a day learning skills to boost our careers.
Master in DevOps, SRE, DevSecOps & MLOps!

Learn from Guru Rajesh Kumar and double your salary in just one year.

Get Started Now!

Accelerating Machine Learning Lifecycle with a Feature Store

Source: infoq.com

Feature Store is a core part of next generation ML platforms that empowers data scientists to accelerate the delivery of ML applications. It enables the teams to track and share features with versioning enabled and serve features for model training, batch, and real-time predictions. Mike Del Balso from Tecton.ai and Geoff Sims from Atlassian recently spoke at Spark AI Summit 2020 Conference about the feature store driven ML development.

Del Balso talked about machine learning process shortfalls like limited predictive data, long development cycles, and painful path to production which typically involves multiple teams, lot of resources and different implementations. He spoke about Operational ML, which basically consists of applied ML solutions that drive user experiences in use cases like fraud detection, click-thru rate (CTR) prediction, recommendation, and search. Building OperationalML applications is very complex and data is at the core of that complexity. Actual ML code is a smaller portion of the overall effort compared to tasks like configuration, data collection, feature engineering, and resource management.

Features are major building blocks of any ML application but the current tooling for managing features is not where it needs to be. There is a need to automate the process of deploying and operating the feature pipelines in production, including feature engineering and feature serving.

Del Balso discussed Tecton, a data platform for machine learning applications, that automates the full operational lifecycle to make it easy for data science teams to manage features throughout the lifecycle in a typical ML process. It can be used to extract data from data sources (batch or real-time) and transform that data as feature pipelines, and organizes feature values in a Feature Store. Data platforms for ML solve critical problems like managing the sprawling and disconnected feature transformation logic, building quality training sets from messy data, and deploying to production.

ML features are highly curated data in a business, but they are some of the poorly managed assets. Since each ML model typically has hundreds, if not thousands, of features to manage, this challenge makes it difficult to scale ML efforts in organizations. He recommended that features should be managed as feature data as well as feature transformation code that’s used to generate it.

He discussed some common challenges with assembling training data like stitching multiple data pipelines together, data leakage, and delivering training data to training jobs. Data science and engieering teams also face problems when deploying their models to production and moving from a batch environment to real-time. Some of these challenges are related to infrastructure provisioning and drift & data quality monitoring. An enterprise-grade Feature Store can manage the feature training and feature serving.

Geoff from Atlassian talked about how they used the Feature Store solution to automate content categorization in one of their popular products, Jira, by automatically labeling for every issue tracked in Jira. They used the feature store to collect a large amount of events, store the features per model and update it in real time, as well as generate the features and predictions.

Related Posts

What is Machine Learning and what are the Types of Machine Learning Tools Available?

What is Machine Learning? Machine Learning is a subfield of Artificial Intelligence that incorporates statistical models and algorithms to help computer systems learn from data and improve Read More

Read More

What is an Autonomous System and what are Applications of Autonomous Systems?

Introduction to Autonomous Systems Autonomous systems, once the stuff of science fiction, have become a reality in our world today. From self-driving cars to drones, robots, and Read More

Read More

What is Predictive Analytics and what is the Types of Predictive Analytics Tools

Introduction to Predictive Analytics Tools As businesses continue to collect vast amounts of data, it becomes increasingly challenging to make informed decisions that drive growth and improve Read More

Read More

What is Neural Network Libraries and What are the popular neural network libraries available today?

1. Introduction to Neural Network Libraries Neural networks are being used more and more in today’s technology landscape, powering everything from image recognition algorithms to natural language Read More

Read More

What is Reinforcement Learning and What are Reinforcement Learning Libraries?

Introduction to Reinforcement Learning Reinforcement learning is a machine learning technique that involves training an agent to make decisions based on trial and error. It is an Read More

Read More

What are Graphical Models? Why use Graphical Models Libraries and Types of Graphical Models Libraries?

Graphical Models Libraries are powerful tools that allow developers and data scientists to build complex models with more accuracy and less complexity. These libraries help in capturing Read More

Read More
Subscribe
Notify of
guest
0 Comments
Oldest
Newest Most Voted
0
Would love your thoughts, please comment.x
()
x