NEWS Virtualized GPUs Target Deep Learning Workloads on Kubernetes

aiuniverse — Fri, 08 May 2020 12:11:24 +0000

Source: virtualizationreview.com

Israel-based Run:AI, specializing in virtualizing artificial intelligence (AI) infrastructure, claimed an industry first in announcing a fractional GPU sharing system for deep learning workloads on Kubernetes.

The company offers a namesake Run:AI platform built on top of Kubernetes to virtualize AI infrastructure in order to improve on the typical bare-metal approach that statically provisions AI workloads to data scientists. The firm says that approach comes with limits on experiment size and speed, low GPU utilization, and lack of IT controls.

Creating a virtual pool of GPU (graphics processing unit) resources, the company says, abstract data science workloads from infrastructure to simplify workflows.

In an announcement today (May 6), Run:AI said its fractional GPU system lets data science and AI engineering teams run multiple workloads simultaneously on a single GPU, helping organizations run more workloads such as computer vision, voice recognition and natural language processing on the same hardware, lowering costs.

To overcome some limitations on how Kubernetes handles GPUs, the company resorted to some tricky math, effectively marking them as floats that can be fractionalized for use in containers, rather that integers that either exist or don’t.

“Today’s de facto standard for deep learning workloads is to run them in containers orchestrated by Kubernetes,” the company said. “However, Kubernetes is only able to allocate whole physical GPUs to containers, lacking the isolation and virtualization capabilities needed to allow GPU resources to be shared without memory overflows or processing clashes.”

The result of the company’s work to overcome that limitation are virtualized logical GPUs — sporting their own memory and computing space — that appear as self-contained processors to containers.

Especially useful in lightweight workloads — including inference — eight or more container-run jobs can share the same physical chip, while typical use cases allow for only two to four jobs running on one GPU.

The post NEWS Virtualized GPUs Target Deep Learning Workloads on Kubernetes appeared first on Artificial Intelligence.

Virtualized AI: Deep Learning Needs More than Just More Compute Power

aiuniverse — Sat, 25 Apr 2020 12:14:31 +0000

Source: enterpriseai.news

Is the recent progress in deep learning true artificial intelligence? A widely-discussed article by Google’s Francois Chollet discusses the skill acquisition-based approach to gathering intelligence – the one currently in use in modern DL. He argues that with huge data sets available for training models, AI is mastering skill-acquisition but not necessarily the “scope, generalization difficulty, priors, and experience” that true AI should incorporate. Even with our progress in AI, and specifically DL, we are nowhere near the limits of what DL can achieve with bigger, better-trained more accurate models, those that take into account not only skill but experience, and generalization of that experience.

Understandably, this has put intense focus on computing power, particularly the hardware that enables data scientists to run complex training experiments. Nvidia increasing sees DL as a key market for its GPUs and bought Mellanox to speed communication inside a GPU cluster. With its recent acquisition of Habana, Intel is likely betting that custom AI accelerator hardware is a better match. Other AI-first hardware includes Cerebras’s massive chip in a custom box that’s designed for the specific types of intensive, long-running workloads that training DL models require. In the cloud, Google’s Tensor Processing Units offer another bespoke option.

For companies running their own DL workloads, more compute is generally better. Whether exotic AI accelerators or tried-and-tested GPUs, quicker model training means more iterations, faster innovation and reduced time-to-market. It may even mean we can achieve “strong” AI (i.e., AI than goes beyond “narrow AI,” which is the capability of doing a single, discrete task) quicker. In 2020, continuing the trend of recent years, companies will invest in ever-more AI hardware, in an effort to satisfy data scientists’ demands for compute to run bigger models to solve more complex business problems.

But hardware isn’t the whole picture. The conventional computing stack – from processor to firmware to virtualization, abstraction, orchestration and operating layers through to end-user software – was designed for traditional workloads, prioritizing high-availability, short-duration operations.

Training a DL model, though, is the opposite of this sort of workload. While running a model, an experiment may need 100 percent of all the computing power of one or multiple processors for hours or even days at a time.

Part of the challenge is that, while developing a DL algorithm, data scientists have two basic use-patterns for compute resources. The first phase of development is building a model, which includes writing new code and debugging it until the model is ready. During this phase, they tend to use a single GPU often but for short periods of time.

The second phase is training, where the model consumes all the training data and adjusts its parameters. This might consume more than a single GPU and could even take a whole cluster working for days. Sometimes data scientists want to try training a few variations of the same model in parallel to see which performs better.

In a large company, computing resources for DL are typically provided by the IT department. Perhaps each data scientist is statically allocated a fixed amount of physical resources, say a GPU or two for building and training models. Inevitably, this means that expensive processors are sitting idle. Alternatively, a data science team might share their processing power and have to squabble over who gets to tie up the Nvidia DGX AI supercomputer for three days and who has to wait their turn.

All of this also creates challenges for enterprise IT. The IT department has limited visibility into how data science teams are using their expensive compute resources. Meanwhile, the C-suite doesn’t really understand how their GPU resources are being used and whether that usage matches their business goals. Should they invest money in more hardware? Should they hire more data science teams? Or is the issue in the workflow, with both idle resources and data scientists, unable to utilize them, having to wait for compute time.

Every minute a GPU or AI accelerator is idle is an opportunity cost. IT departments face under-utilization of their GPUs while data science teams see their productivity damaged because, from their point of view, the hardware is ‘in use’ and can’t train a new model until it’s finished with its current job. If unused GPUs could be used at full capacity, it would allow faster model training and more iterations and faster time to market.

This is the challenge that companies are beginning to face. Better hardware and more of it might well be necessary, but it isn’t sufficient if the software stack isn’t set up to also make efficient and effective use of that hardware.

The fundamental question of how to share hardware efficiently isn’t new. Some of the challenges that data scientists face could be solved by looking again at how virtualization solved this problem in traditional computing.

Traditional computing uses virtualization to share a single physical resource between multiple workloads. But what if instead of sharing a single physical resource, virtualization was used to create a pool of resources, allowing DL projects to consume as much of the shared resources as they need in an elastic, dynamic way? A virtualized AI infrastructure for DL would run a single workload on multiple shared physical resources. Ideally these resources could be dynamically allocated to the experiments that need them the most, allowing IT administrators to manage resources efficiently, reducing idle GPU time and increasing cluster utilization.

The software stack for DL needs to evolve along with the chips, both to get the most out of individual training experiments and to better optimize running multiple experiments in parallel. Companies will need a full stack, AI-first solution that accounts for the needs of both DL work loads and, critically, DL organizations.

The post Virtualized AI: Deep Learning Needs More than Just More Compute Power appeared first on Artificial Intelligence.

Virtualized Archives - Artificial Intelligence

NEWS Virtualized GPUs Target Deep Learning Workloads on Kubernetes

Virtualized AI: Deep Learning Needs More than Just More Compute Power