Bad Archives - Artificial Intelligence

When A Good Machine Learning Model Is So Bad

aiuniverse — Wed, 30 Jun 2021 09:45:54 +0000

Source – https://www.informationweek.com/

IT teams must work with managers who oversee data scientists, data engineers, and analysts to develop points of intervention that complement model ensemble techniques.

Most managers feel euphoria when implementing a technology meant to enhance the workflow of a team or an organization. But they often overlook the details that help implement the technology successfully. The same sentiment can occur for managers who oversee data scientists, data engineers, and analysts examining machine learning initiatives.

Every organization seems to be in love with machine learning. Because love is blind, so to speak, IT teams become the first line of defense in protecting that euphoric feeling. They can start that protection by helping managers appreciate how models fit observations from data sources. Appreciating the statistical balance in data models is essential for establishing management that minimizes errors that lead to very poor real-world decisions. Overfitting and underfitting is the key part of that discussion.

Overfitting and underfitting address how training data performance compares to production data performance of a model or machine learning algorithm. An analyst can see good performance on the training data but experience results that exhibit poor generalization with a new data sample or, even worse, in production.

So how does all of this work in practice? Overfit means the model treats noise in the training data as a reliable indicator, when in reality the noise distorts. The model creates a poor prediction from any new dataset that does not contain the same or any noise in it — namely the production data. From a statistics standpoint, overfitting occurs if the model or algorithm shows low bias but high variance

Underfit introduces a different model performance issue. Intuitively, underfit implies that the model or the algorithm does not capture all the data well enough to understand the statistical relationships among the data. From a statistics perspective, underfitting occurs if the model or algorithm shows low variance but high bias.

Both model conditions reduce generalizations to poor decisions. Generalizations are the capacity for machine learning models to accurately access unseen data. Getting the right generalization is at the heart of establishing a good machine learning model.

One avenue for analysts is to examine the training data to determine if additional observations are possible to avoid adding unbalanced data sets to models. I explained unbalanced datasets previously in a previous post.

But there are limits to adding observations or adding features. There are phenomena in which adding more data yields no further performance improvements. One example is called the Hughes phenomenon, which shows that as the number of features increases, a classifying model’s performance increases up to a point of optimal number of features, then decreases performance as more features based on the same size as the training set are added. The Hughes phenomenon should certainly remind data professionals of the curse of dimensionality. The number of possible unique rows grow exponentially for many instances, such as high-dimensional models. The variance increases from the additional observations as well. The result is a model with more opportunities to overfit, making accurate generalization harder to establish and raising development inefficiency.

Thus, the most likely efforts will involve finding a balance between bias and variance. Having low bias and variance is a desired objective but usually is impractical or impossible to achieve. Analysts should focus on cross-validation techniques, like gradient boosting, to minimize the likelihood of implementing a poor model.

IT teams must work with managers who oversee data scientists, data engineers, and analysts to develop points of intervention that complement model ensemble techniques. The interaction can also lead to forming robust management processes like observability for incident detection and root-cause reporting. The result is a system that minimizes operational downtime related to data issues. It also produces a process point for managing a balance of bias and variance that protects model accuracy and yield fair outcomes.

Signal noise does not mean that ethics exists in an outcome. Good judgment will make sure ethics in the outcome occur. Such outcomes are certainly worth a euphoric feeling.

The post When A Good Machine Learning Model Is So Bad appeared first on Artificial Intelligence.

Why ML For ML’s Sake Is A Bad Idea?

aiuniverse — Sat, 13 Feb 2021 05:53:29 +0000

Source – https://analyticsindiamag.com/

Today, businesses are increasingly reliant on artificial intelligence and machine learning to solve critical problems. However, dealing with immense data complexities along with the pressure of having to provide rapid results, could be crippling. Most companies find building an ML-savvy framework quite overwhelming. In an engaging session at MLDS 2021, Sayanti Bhattacharya, Senior Manager, and Ashwin Pai, Manager at Ugam, a Merkle Company, addressed how businesses can apply machine learning to drive results.

Common Misconceptions

Machine learning has become such a fashion statement that, more often than not, businesses jump the gun by implementing ML in a hurry, defying logic. Bhattacharya stressed on the importance of focusing on the right way to chart a company’s ML journey. For instance, said Bhattacharya, we often see ML applications in our daily lives in the form of maps, digital ads, object detection technology, personalised notifications, etc. But when it comes to business applications, leaders are daunted by thoughts such as; the ML is complicated; involves coding; dicey on the value it brings; and worried about its overall compatibility with their business plan, etc.

Pai has neatly laid out the ML concept in simple terms:

Taxonomy: It is essential to understand that machine learning is a subset of AI and includes supervised learning, unsupervised learning and reinforcement learning. They are further divided into tree-based models, association rules, neural networks, regression, clustering, similarity algorithms, transfer learning, deep reinforcement learning and more. “There is a whole lot of length and depth associated with machine learning,” stressed Pai.

Common beliefs: Pai has also brought up the uncalled pressure companies face in implementing ML just because it’s trendy. More often than not, companies talk about adopting ML without realising the underlying need for it. They fall for ‘bigger the better’ trap and end up integrating complex algorithms.

When Should I Use ML?

Bhattacharya said the companies need to do a reality check to assess if ML is critical to their operations. Machine learning can work well for tasks that entail sequential decision making or rule-based decision making. “Having said that, bigger is not always better,” she added.

Picking up from Bhattacharya left off, Pai said the key is to keep ML scalable but straightforward. For instance, in the earlier days, apriori algorithms were used for concepts such as product affinity, ARIMA for forecasting and logistic regression for classification, but are now replaced by more complex algorithms such as LSTM and deep neural networks, as data grew in volume over the years. While there are many options available to approach a problem, it is crucial to break down the problem and then apply ML, if necessary, said Pai.

Incremental Improvements

Bhattacharya pointed out that for machine learning to create the most value, it is essential to consider ML as a marathon and not a sprint. She said, rushing into incorporating ML into an organisation’s workflow may lead to challenges such as data fatigue, infrastructure fatigue, time fatigue and most importantly, expertise fatigue. “Dealing with data requires collecting, analysing and harmonising it — jumping into it will lead to a stressful journey,” she said. Therefore, ML requires building endurance rather than speed. For instance, she said, areas such as customer review and ratings, website search metadata, customer service enhancement etc can be done with text analysis.

To build M, laying a strong foundation is essential. Detailing a use case, she said they used elementary application of text mining for a situation where they had to understand what customers were saying about the products on a website—implementing a solution as easy as this resulted in a 20% reduction in product return and identification of unauthorised users.

Building on this foundation, they further classified the problem to understand customer’s complaints on order deliveries from customer call logs. The team introduced topic modelling to identify related words and classify them into topics leading to 30% lesser customer complaints, better order tracking and delivery experience.

The next goal was to understand what is important for consumers in a category and what do they like and dislike in the current assortment. The team applied topic modelling and sentiment analysis to identify themes, and overlaid a sentiment analysis framework to generate actionable insights. This resulted in a 30% increase in analysing customer feedback and a 25% reduction in cost.

Is layering the only approach to implement ML? “No,” says Bhattacharya. Ensembling various methods is another good option. Use of agglomerative clustering, cluster profiling, rule mining, price grouping and rule-based binning can result in grouping similar listings that point to the same product. “Starting with one element and building upon it is the key,” she stated.

Pai also pointed out that while endurance is great, technology plays an important role to keep up with these developments. “Improving technology, tech stack, data engineering capabilities will help in maximising the impact,” he said.

Key Takeaways

All problems do not need high-end ML solutioning
More complex does not equal to better outcomes
Think marathon, not sprint
Make incremental improvements
Technology/ infrastructure capabilities
A measurement framework along with a north-star metric is crucial to measure the value

The post Why ML For ML’s Sake Is A Bad Idea? appeared first on Artificial Intelligence.