Open-Source Machine Learning Is Free, As In Beer
Machine learning (ML) continues to amaze us with its abilities and is set to transform the economic structure of many industries — from producers of widgets to financial analysts and health care providers. But many IT and operations practitioners are struggling to put the rapid advances in ML to work for their organizations.
To take advantage of ML in your environment, you should start with an understanding of the benefits you aim to achieve, such as improving efficiency, accuracy and safety, or cutting the cost of delivery of goods/services. Next, how will you get there? Roll your own? Integrate open-source code bases? Buy a product or service? Use a public cloud?
These decisions need to reflect the realities of the ML domain, your business needs and the skills you have available.
ML Is One Ingredient, Not The Whole Solution
Because ML is developing so fast, I believe solutions should be able to take advantage of the powerful techniques being made available in open-source. Ensure your solution is “ML agile” and able to upgrade to newer algorithms easily. An application architecture that allows you to plug in the major open-source frameworks could save years of effort and get your solution into production quicker.
The principal challenge with adopting any ML-based solution is the effort needed to build and tune models to your environment — which demands highly skilled engineers/data scientists. For that reason, insist that your integrators manage a solution throughout its life, and don’t sign off on a bespoke app until you have proof that it works. Remember that you may not know if a solution delivers significant benefits until you have seen it work in practice for long enough to establish the operational costs of false positives and false negatives.
The Power Of Open-Source
Today, the leading edge of innovation in ML algorithms and tools is to be found in open-source code bases that enjoy broad support. The community development model appeals to researchers, end users and developers – users can be sure they won’t be stranded with a dead-end proprietary stack, and developers can confidently invest time and effort into widely used code bases, developing skill sets that are portable across projects, employers and even clouds. The open-source ML tools are not only the leading edge of algorithm development but also embody the de facto work practice of many data scientists.
Worth noting is that the near-universal collaboration on a common set of open-source tools does not commoditize ML per se. Sure, the code is free, but there has been a sea change in the community development model: Leading researchers and practitioners pool their efforts to deliver a common code base of great value, freely available to all. It’s a fascinating trend that also serves the competitive interests of major contributors – like Google, Amazon and Microsoft – that gain advantage by ensuring competitors with proprietary solutions cannot keep up. For the cloud providers, ML-based workloads are a great way to monetize their cloud infrastructure, from storage through central processing unit, memory and graphics processing unit/tensor processing unit (TPU) resources.
Finally, note that the free availability of algorithms has not killed the value chain. Chip vendors, including Google with its TPU, NVIDIA and over 40 startups working on hardware acceleration for ML, aim to monetize the resource-hungry training and inference with proprietary acceleration hardware for clouds or on-prem devices.
What’s Next For ML?
Successful open-source projects attract developers and researchers, and successful ML open-source software projects become focal points of innovation for the industry, accelerating the state-of-the-art and delivering the power of collective contributions to all stakeholders. Contrast the strong community support for Google TensorFlow and the almost complete absence of a community around the proprietary IBM Watson. The integration of TensorFlow into a comprehensive set of consumer and enterprise solutions will build preference for Google services and TPUs, keep developers focused on Google technologies and give Google immense bragging rights – promoting itself in every ML success on the part of its community.
Cloud providers have massive marketing budgets and immense reach, and they already use ML to differentiate their packaged services – embedding AI smarts into applications they monetize via a subscription licensing model. This approach saves customers from having to understand the technology. Providers also win by building strong affinity with the user by incorporating ML-powered features that quickly make their way into SaaS apps.
Open-source ML is fuel for a race favoring execution and development efficiency. Winners will capitalize on the ready availability of powerful tools to deliver economic benefits quickly and at a reasonable cost. Those that adhere to the mantra of proprietary secret sauce may succeed tactically, but I believe they are doomed to eventual failure – slower adoption and less developer support.
Although the algorithms in major open-source frameworks form an immensely valuable community commons, it is the complete solution that is of value for your use case. There is plenty of room for proprietary innovation – delivering vertically integrated packages for specific industries, and infrastructure and packaging that makes these powerful technologies simple enough for non-data scientists to easily adopt and scale.