Opinion | Machine learning reveals computers as bad students
Artificial intelligence (AI) is deeply linked with machine learning (ML). In fact, almost all of AI today is simply ML—in other words, an attempt to get a computer to make itself more efficient at its task without the need for human intervention. As an investor in deep-tech and science companies, I have had the occasion to see several startups that claim to use AI/ML.
Neither AI nor ML are “deep-tech”. The applicability of ML is limited, at least today, primarily to the field of data science, where one is actually only trying to ask simple questions of a data set.
Most of these questions revolve around whether there is a pattern to the data that is present in the data set, and seek to answer fairly simple questions, such as, “Is this customer likely to buy product X if they have already bought product Y?” or “Does this medical scan contain evidence of cancer?”
ML tries to filter out the “noise” from a data set and arrive at a “signal”. This is the realm of “data science”. Data science draws on inductive reasoning—as opposed to the deductive reasoning of arithmetic and algebra. While the conclusion of a deductive process is certain, the truth of the end of an inductive reasoning process is only probable. Statistical modelling allows an ML program to systematically quantify and reason about the inherent uncertainties of inductive reasoning.
Every set of data being thrown at an ML model is confusing, especially when the data contained in it is at a large scale.
The confusion in these large data sets will mean that there are four possible outcomes while looking for a “signal”: a) that the actual data point represents a true positive (as in yes, this scan shows cancer); b) that the data point represents a true negative (as in there is no cancer); c) a false positive (as in, yes, this scan indicates cancer, when in fact it doesn’t); and d) a false negative (as in, no, this scan doesn’t indicate cancer, when in fact cancer is present).
One very soon begins to see that the test data used to “teach” a machine to “learn” on its own becomes crucial. This is why many startups promise to generate new data sets that can later be used to train an ML model. This “data exhaust” is presumed to be useful simply because it produces voluminous new data about a subject.
Not so fast, I tell these startups. Just because one can use ML, it doesn’t necessarily follow that the ML model is useful. Neither does it follow that a particular ML model is more effective than a different ML model.
The good news is that there are plenty of ways to gauge the effectiveness of an ML model, but they can be brought down to the four types of predictions I described above (positives, negatives, false positives and false negatives).
The first of these is the prevalence of positives in the data set being used to train the model, and the accuracy of the model in picking those positives.
Let us say only 10% of 100,000 medical scans that the ML model is being fed to learn from actually indicate the presence of cancer. This number is important, since it gives us a base measure of what the ML model should be able to achieve on its own, after it has worked its way through the vast maze of positives, negatives, false positives and false negatives.
In a random pick from this data set, the probability that the pick is positive is 10% and 90% that it is negative. The startup’s ML model should be much more accurate than a random pick. However, the issue with this strict statistical measure of “accuracy” is that it includes both positives and negatives (the ML model should be accurate at predicting both).
This can present a problem, since the model could easily pick all negatives (which constitute 90% of the data set in this instance) and still be 100% accurate. However, it would be useless, since this “accurate” model hasn’t been able to pick any of the cancer-positive scans.
The second is the ML model’s precision. Precision is the number of true positives that the model finds. The startup’s ML model precision would have to be significantly greater than the prevalence of true positives (10%) in the example. Otherwise the model is only as good as any random choice at predicting an outcome.
Now, let’s assume the model’s precision is 100%. The next measure will be its ability to collect true positives from the data set. So, in a data set with 100,000 medical scans, with 10,000 instances of cancer (10%), the efficacy of the model’s collection depends on how many instances of cancer it detects.
Although its precision is now 100%, if it only detects 5,000 out of the 10,000 true instances, its collection rate means it has missed the other 5,000. These two measures are trade-offs between one another. Decreasing the model’s precision can increase its collection, but now in addition to more than 5,000 true cases, it will also collect noise: Negatives, false negatives and false positives.
So, what kind of ML model does a startup create with trade-offs between accuracy, collection capability and precision? That depends on the outcome that the model is trying to predict.
There are various other complexities when models deal with sensitive data. Predicting a repeat buy of a pair of jeans is very different from detecting cancer. Despite large “data exhausts”, sane professionals who understand the field need to come in and help train the model. We will need expert humans for a while yet.