Building Better Deep Learning Requires New Approaches Not Just Bigger Data
In its rush to solve all the world’s problems through deep learning, Silicon Valley is increasingly embracing the idea of AI as a universal solver that can be rapidly adapted to any problem in any domain simply by taking a stock algorithm and feeding it relevant training data. The problem with this assumption is that today’s deep learning systems are little more than correlative pattern extractors that search large datasets for basic patterns and encode them into software. While impressive compared to the standards of previous eras, these systems are still extraordinarily limited, capable only of identifying simplistic correlations rather than actually semantically understanding their problem domain. In turn, the hand-coded era’s focus on domain expertise, ethnographic codification and deeply understanding a problem domain has given way to parachute programming in which deep learning specialists take an off-the-shelf algorithm, shove in a pile of training data, dump out the resulting model and move on to the next problem. Truly advancing the state of deep learning and way in which companies make use of it will require a return to the previous era’s focus on understanding problems rather than merely churning canned models off assembly lines.
In the era of hand-coded content understanding systems and hand-tuned classical statistical machine learning algorithms, building solutions required deeply understanding the problem domain. Programmers would work hand-in-hand with subject matter experts, deeply immersing themselves in the field, studying human practitioners with the precision and detail of an ethnographic study and even perform the task themselves to learn its complexities and nuances.
Building a solution required deeply understanding the problem domain.
In contrast, today’s deep learning practitioners adhere to the utopian dream of galleries of canned models that can be simply plunked from a shelf, shoved full of raw training data from watching humans perform the task and then simply dropped in to take over, without its programmers needing to know a single thing about the problem the model is designed to solve.
While the idea of HAL 9000-like general intelligences capable of taking on any task represents the holy grail of AI research, we are very far from such systems even being on the horizon. Instead, today’s systems are more akin to glorified correlation engines that can perform a single task reasonably well, provided they are given properly curated training data.
Today’s deep learning algorithms are entirely dependent on the quality of their training data, since it represents the totality of their worldview and understanding of the problem domain.
This means that training data must be exquisitely curated, balanced to provide sufficient examples and counterexamples at the boundary points where the underlying learning algorithm struggles. The problem is that these boundary points are rarely well understood.