Upgrade & Secure Your Future with DevOps, SRE, DevSecOps, MLOps!

We spend hours on Instagram and YouTube and waste money on coffee and fast food, but won’t spend 30 minutes a day learning skills to boost our careers.
Master in DevOps, SRE, DevSecOps & MLOps!

Learn from Guru Rajesh Kumar and double your salary in just one year.

Get Started Now!

Avoiding Garbage in Machine Learning

Source: spglobal.com

“Garbage in, garbage out” – it’s a cliché in machine learning circles. Anyone who works with artificial intelligence (AI) knows that the quality of the data goes a long way toward determining the quality of the result. But “garbage” is a broad and expanding category in data science – poorly labeled or inaccurate data, data that reflects underlying human prejudices, incomplete data. To paraphrase Tolstoy, great datasets are all alike, but all garbage datasets are garbage in their own, unique and horrible ways.

People believe in machine learning. Israeli philosopher and historian Yuval Noah Harrari coined the term “dataism” to describe a blind faith in algorithms. This faith extends beyond machine learning’s ability to analyze data. Many people believe machine learning is auto-magically able to predict the future. This is inaccurate. Machine learning is excellent at identifying patterns in large well-labeled datasets. In certain cases, those patterns will continue to unfold in the future. In other cases, they won’t. However, the machine-learning researcher must assume responsibility for the perception that AI is predictive. Mitigating the effects of garbage data becomes a moral imperative when your algorithm is being used to make parole decisions or to invest billions in hard-earned pension dollars.

Faith in machine learning isn’t misplaced. The reason machine learning is considered more objective than other methods of data analysis is because it is demonstrably superior. Machine learning is able to produce models that can analyze larger, more complex datasets than conventional methods, and deliver more accurate results, all at scale. When used properly, machine learning can help companies to identify profitable opportunities and avoid risks. But even a technology so advanced can’t perform the alchemy of turning garbage into gold. Learning to identify “garbage” data is the first step in unlocking machine learning’s vast potential.

“Machine learning is not magic,” cautions Nick Cafferillo, chief data and technology officer at S&P Global. “It’s about augmenting our thought processes to help prove – or disprove – a hypothesis we have; a hypothesis that is founded on real business questions that we understand in the abstract.” 

Garbage data is a concern when you’re working with machine learning because there are two opportunities for the garbage to mess up your results. First, if you train your machine learning model with garbage data, you have baked bad data into the underlying algorithm. Feed good data into a neural network trained on garbage and your results may be inaccurate. Alternately, you could train your neural network on great data and then run garbage data through the well-trained algorithm. Either way, your output will be questionable.

Related Posts

What is Machine Learning and what are the Types of Machine Learning Tools Available?

What is Machine Learning? Machine Learning is a subfield of Artificial Intelligence that incorporates statistical models and algorithms to help computer systems learn from data and improve Read More

Read More

What is an Autonomous System and what are Applications of Autonomous Systems?

Introduction to Autonomous Systems Autonomous systems, once the stuff of science fiction, have become a reality in our world today. From self-driving cars to drones, robots, and Read More

Read More

What is Predictive Analytics and what is the Types of Predictive Analytics Tools

Introduction to Predictive Analytics Tools As businesses continue to collect vast amounts of data, it becomes increasingly challenging to make informed decisions that drive growth and improve Read More

Read More

What is Neural Network Libraries and What are the popular neural network libraries available today?

1. Introduction to Neural Network Libraries Neural networks are being used more and more in today’s technology landscape, powering everything from image recognition algorithms to natural language Read More

Read More

What is Reinforcement Learning and What are Reinforcement Learning Libraries?

Introduction to Reinforcement Learning Reinforcement learning is a machine learning technique that involves training an agent to make decisions based on trial and error. It is an Read More

Read More

What are Graphical Models? Why use Graphical Models Libraries and Types of Graphical Models Libraries?

Graphical Models Libraries are powerful tools that allow developers and data scientists to build complex models with more accuracy and less complexity. These libraries help in capturing Read More

Read More
Subscribe
Notify of
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x