Upgrade & Secure Your Future with DevOps, SRE, DevSecOps, MLOps!

We spend hours on Instagram and YouTube and waste money on coffee and fast food, but won’t spend 30 minutes a day learning skills to boost our careers.
Master in DevOps, SRE, DevSecOps & MLOps!

Learn from Guru Rajesh Kumar and double your salary in just one year.

Get Started Now!

How Machine Learning Will Disrupt The Established Cloud Providers

Source – forbes.com

In the past few years, new categories of products have emerged thanks to the extraordinary advances in machine learning (ML) and deep learning (DL). These new techniques power product recommendations, computer-aided diagnosis in medical imaging and self-driving cars, just to name a few.

Most ML and DL algorithms require compute profiles (hardware, software, storage, networking) that are significantly different from those optimized for traditional applications. Consequently, as more and more companies develop their own ML/DL solutions and deploy them to production, the demand for the ML-optimized compute resources will grow dramatically and create opportunities for new entrants to offer solutions that compete with today’s dominant cloud providers: Amazon AWS, Microsoft Azure and Google Cloud.

The ML/DL Cloud Is Different

In an article on Mesosphere’s blog page, Edward Hsu presented the case that web applications are now primarily data-driven. Consequently, a new set of frameworks (a.k.a. stacks), namely SMACK (Spark, Mesos, Akka, Cassandra, Kafka), must replace the traditional LAMP (Linux, Apache, MySQL, PHP) stack used to build web-based applications. In my view, rather than replacing LAMP, SMACK will coexist side by side with, and feed data to, traditional web-based based frameworks, which are still needed to present nice-looking webpages and interface with mobile phones.

Yet the main point is well-taken. We need to update Marc Andreesen’s famous line about how “Software is eating the world” to “Data is eating the world.” Let’s unpack this statement and derive the consequences.

Hardware

The disruption created by machine learning and deep learning extends well beyond the software stack into chips, servers and cloud providers. This disruption is rooted in the simple fact that GPUs are much more efficient processors for ML and DL than traditional CPUs.

Up until recently, the solution was to augment traditional servers with GPU add-on cards. We are now at a point where demand for ML/DL computing is such that special-purpose servers, optimized for ML/DL compute loads, are being built.

Data centers are also being re-architected to support the extremely large amount of data consumed by ML and DL. Imagine you are designing the brains for self-driving cars. You need to process thousands and thousands of hours of video (and other such signals as GPS, gyroscopes, LIDAR) to train your algorithms. The amount of data that a Tesla on the road records in one second is a million times larger than a tweet or a post on Facebook.

ML/DL data centers thus require both huge amounts of storage and extremely high bandwidth.

Software

The software side is even more complex. A new infrastructure stack, typically using machine learning-specific frameworks such as Tensorflow (originally developed by Google) or PyTorch (originally developed at Facebook), is required to shepherd data around and manage the execution of the compute jobs. Furthermore, open-source code libraries (pandas, scikit-learn, matplotlib) are used to implement the models (e.g., neural networks, data displays). These model libraries are critical because they are optimized to be both easy to use for algorithm research and offer high performance for use in production.

Finally, each vendor offers complete building blocks for specific use cases. For example, Amazon Lex, Google Cloud Speech and Microsoft Bing Speech provide speech recognition and can even recognize intent. Each has its own API and unique behavior, making the migration from one vendor to the other time-consuming.

New Entrants

In addition to the Big Three cloud providers (Amazon AWS, Microsoft Azure and Google Cloud) that have offered GPU-accelerated instances for a few years, new ML-optimized offerings have emerged:

• NVIDIA, which is already the dominant provider of GPUs that power the graphics cards that drive computer displays, recently introduced a portfolio of “purpose-built AI supercomputers” servers known as its DGX systems.

Servers.com offers its Prisma Cloud with dedicated GPU-optimized servers.

• Rescale, one of the niche cloud providers that focuses on high-performance computing (HPC), just announced the availability of the latest generation of GPU-powered servers, along with high-bandwidth interconnect, to create high-performance multi-node clusters.

What’s At Stake

The Big Three cloud providers are the ones most immediately at risk to be disrupted by new entrants such as NVIDIA, Servers.com and Rescale. ML/DL innovation is still running at a torrid pace thanks to innovation in algorithms as well as compute efficiency. This is creating a small arms race where end users are constantly looking for the provider that can give that extra edge.

On one hand, end users are benefiting hugely from this arms race to provide the best software and hardware compute environment. On the other, this requires constant vigilance to keep abreast of the latest offerings. Even more importantly, when deploying ML/DL products to production, CEOs and CTOs need to pick the winner — or at least a future survivor — that will keep their edge for the next two to five years. This is not an easy task.

We will delve deeper into these two topics in future posts — stay tuned.

Related Posts

What is Deep Learning and Why is Deep Learning Important?

What is Deep Learning? Deep Learning is a subfield of machine learning that attempts to model high-level abstractions in data by using multiple processing layers with complex Read More

Read More

What is Machine Learning and what are the Types of Machine Learning Tools Available?

What is Machine Learning? Machine Learning is a subfield of Artificial Intelligence that incorporates statistical models and algorithms to help computer systems learn from data and improve Read More

Read More

What is an Autonomous System and what are Applications of Autonomous Systems?

Introduction to Autonomous Systems Autonomous systems, once the stuff of science fiction, have become a reality in our world today. From self-driving cars to drones, robots, and Read More

Read More

What is Predictive Analytics and what is the Types of Predictive Analytics Tools

Introduction to Predictive Analytics Tools As businesses continue to collect vast amounts of data, it becomes increasingly challenging to make informed decisions that drive growth and improve Read More

Read More

What is Neural Network Libraries and What are the popular neural network libraries available today?

1. Introduction to Neural Network Libraries Neural networks are being used more and more in today’s technology landscape, powering everything from image recognition algorithms to natural language Read More

Read More

What is Reinforcement Learning and What are Reinforcement Learning Libraries?

Introduction to Reinforcement Learning Reinforcement learning is a machine learning technique that involves training an agent to make decisions based on trial and error. It is an Read More

Read More
Subscribe
Notify of
guest
2 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
2
0
Would love your thoughts, please comment.x
()
x