Scientific Machine Learning and HPC-AI Technology Convergence

Upgrade & Secure Your Future with DevOps, SRE, DevSecOps, MLOps!

We spend hours on Instagram and YouTube and waste money on coffee and fast food, but won’t spend 30 minutes a day learning skills to boost our careers.
Master in DevOps, SRE, DevSecOps & MLOps!

Learn from Guru Rajesh Kumar and double your salary in just one year.

Source – https://insidehpc.com/

Some of the most well-known examples of the use of machine learning technics in science applications are the detection and classification of gravitational-waves signals from LIGO and Virgo in astrophysics [1], the recent DeepMind Alpha-Fold2 capabilities outperforming classical methods in protein folding [2] or the winning team of the Gordon Bell 2020 with the Deep Potential Molecular Dynamics [3] which is opening new breakthroughs in the drug design process and could speed up future pandemic response efforts.

Beyond these key examples, the convergence between HPC and AI is natural where DL-based surrogate modelling is more and more widely applied in research and recent advances in physics-informed neural networks such as HNN [4] bring physical properties and constraints to neural networks loss functions opening a great path towards a new generation of simulation.

Atos Centers of Excellence – combining data sciences and industry expertise

In Atos, we built a dedicated approach to support the scientific community and Industries by bringing data science and HPC expertise through the Atos Centers of Excellence. Each center is oriented towards a specific domain where our experts and our customers can jointly bring innovations and technologies with the support of some of our partners.

Some of the first Atos Centers of Excellence are dedicated to weather forecast & climate changes [5] and life sciences [6]. In such advanced domains, applying innovation means creating a strong AI research support, defined through specific programs. Those programs aim to enhance the state of the art of machine learning model for science applications as well as exploring coupling capabilities between simulation and AI model inference including the orchestration of AI augmented HPC workflows to brings such these development in production at scale. Some of the use cases we are addressing are about surrogating modelling and dimensionality reduction on CFD applications or data assimilation and deep learning for chaotic systems.

AI for science applications – more options to come!

In addition to this approach, AI into science applications also means adapting technical architectures of such converged systems. Today most of the HPC applications benefit from SIMD acceleration through GPU technologies, acceleration that AI benefit as well. GPU architectures evolved over the time by increasing the part of their silicon dedicated to AI processing to boost this convergence [7] but the maturity of dedicated AI chips such as the Graphcore IPU or Intel Habana is a new potential keystone for science thanks to the performance gap these technologies could provide.

Nevertheless, the current adoption limitation of dedicated AI technologies is due to the low level of maturity of code hybridization between AI and HPC application which is not sufficient enough to embed dedicated AI acceleration. Hardware acceleration for HPC platforms implies support both HPC applications and AI capacities which is today a key limiting factor for the wide adoption of dedicated AI chips due to the associated software stack not able to naturally support legacy codes with associated programming model (MPI/OpenMP) as AI chipmakers are building their software environment on AI frameworks with dedicated back-end compilers and runtimes. Despite this challenge, promising examples of using AI chips for HPC applications have been recently published, for example on Graphcore IPU but at the cost of an algorithm reformulation or a kernel focus which naturally exposed a tensor representation in the code to be accelerated by dedicated hardware [8].

Large-scale AI infrastructure brings competitive advantage for HPC-AI convergence

HPC-AI convergence is also often seen via the use of HPC knowledge to strengthen AI. As deep learning model complexity continue to growth over years, especially related to natural language processing where the computational costs of transformers models such as GPT-3 gathers 15B parameters network to be trained and a training computing cost around ~10²³Flops [9-10], dedicated large scale AI infrastructures becomes crucial for enterprise competitiveness. Architecture design differs from HPC-AI converged platforms by a different tradeoff between general purpose processing and AI hardware acceleration oriented towards more AI acceleration with an increased interconnect capabilities per node.

Turnkey solution – putting AI solution at your fingertips!

In Atos, we bring decades of experience in large scale computing to make available to any industries scalable AI solutions to support AI business growth and reach a new level of performance with AI turnkey solutions integrating AI technologies at scale, management software, data science platform, AI and HPC knowledge in addition of ML-based software to enhance HPC operations addressing data movement, energy consumption, resources allocation and predictive maintenance to globally enhance HPC operations excellence and total cost of ownership by learning behaviors from each production system.

To conclude, artificial intelligence applied to science application is a promising field of research that will bring great scientific breakthroughs in the future and the Atos HPC-AI team is proud to support this leading-edge research topic.

Upgrade & Secure Your Future with DevOps, SRE, DevSecOps, MLOps!

Related Posts

What is Machine Learning and what are the Types of Machine Learning Tools Available?

What is an Autonomous System and what are Applications of Autonomous Systems?

What is Predictive Analytics and what is the Types of Predictive Analytics Tools

What is Neural Network Libraries and What are the popular neural network libraries available today?

What is Reinforcement Learning and What are Reinforcement Learning Libraries?

What are Graphical Models? Why use Graphical Models Libraries and Types of Graphical Models Libraries?