Researchers detail technique that enables attackers to ‘steal’ reinforcement learning algorithms

Upgrade & Secure Your Future with DevOps, SRE, DevSecOps, MLOps!

We spend hours on Instagram and YouTube and waste money on coffee and fast food, but won’t spend 30 minutes a day learning skills to boost our careers.
Master in DevOps, SRE, DevSecOps & MLOps!

Learn from Guru Rajesh Kumar and double your salary in just one year.

Source: venturebeat.com

A team of researchers at Nanyang Technological University in Singapore claim deep reinforcement learning (DRL) algorithms — algorithms that have been used to predict the shapes of proteins and teach robots to grasp objects — are prone to adversarial attacks that can extract and replicate them, enabling malicious actors to “steal” them. In a preprint paper, the coauthors describe a technique that targets black-box models whose inputs and operations aren’t publicly exposed, and that purportedly recovers DRL models with “[very] high fidelity.”

DRL has gained currency in part thanks to its ability to handle complex tasks and environmental interactions. It incorporates deep learning architectures and reinforcement learning algorithms to build sophisticated policies, which can understand an environment’s context (states) and make optimal decisions (actions). But as DRL increasingly makes its way into commercialized products like Mobileye‘s and Wayve‘s advanced driver assistance systems, it risks becoming a target for adversaries intent on IP theft or potentially harmful reverse-engineering.

The researchers’ approach assumes some knowledge of a target DRL’s domain (i.e., the task the model is performing, the context, and the formats of the inputs and outputs) and that an attacker can set environmental states and observe the DRL model’s corresponding actions. Their attack occurs in two stages:

A classifier predicts the training algorithms of a given black-box DRL model based on its action sequence.
Informed by the extracted algorithms, an imitation learning method generates and fine-tunes a model with similar behaviors as the target.

The classifier is first trained on a large quantity of “shadow” DRL models underpinned by algorithms. Drawing on a diverse pool that includes all algorithms under consideration, the classifier trains DRL models for each algorithm across several environments and evaluates their performance. It then collects the state-action sequences of the best-performing models and generates samples (with the sequences as the feature and the training algorithm as the label) and passes the extracted model to the second stage for refinement.

The second stage — the imitation learning stage — employs GAIL, a model-free learning algorithm that imitates complex behaviors in large-scale and high-dimensional environments. Two models are constructed to contest with each other during the imitation process: a generative DRL model with the extracted algorithm and a discriminative model. The generative model iteratively refines its parameters based on feedback until it can’t distinguish the generated data from the target model, a process that repeats until it obtains a model with performance similar to that of the target.

In experiments, the researchers applied their approach to two popular benchmarks in OpenAI’s Gym software: Cart-Pole and Atari Pong. For each environment, they selected 50 trained models, which resulted in 250 trained DRL models that produced 12,500 sequences of actions.

They found that the classifier distinguished the DRL models of each algorithm with relatively high confidence, ranging from 54% (in Cart-Pole) to 100% (in Atari Pong). As for the imitation learning phase, it managed to replicate a model with the same algorithm that reached similar performance as the target model, particularly in Cart-Pole. “We observe that the [attack] success rate increases when the replicated model has the same training algorithm as the target model,” wrote the researchers. “We expect this study can inspire people’s awareness about the severity of DRL model privacy issues, and come up with better solutions to mitigate such model attacks.”

DeepMind open-sources Lab2D to support creation of 2D environments for AI and machine learning

Source: computing.co.uk Alphabet subsidiary DeepMind announced on Monday that it has open-sourced Lab2D, a scalable environment simulator for artificial intelligence (AI) research that facilitates researcher-led experimentation with environment Read More

Plan2Explore: Active Model-Building for Self-Supervised Visual Reinforcement Learning

Source: bair.berkeley.edu To operate successfully in unstructured open-world environments, autonomous intelligent agents need to solve many different tasks and learn new tasks quickly. Reinforcement learning has enabled Read More

Is AI an Existential Threat?

Source: unite.ai When discussing Artificial Intelligence (AI), a common debate is whether AI is an existential threat. The answer requires understanding the technology behind Machine Learning (ML), and recognizing Read More

Upgrade & Secure Your Future with DevOps, SRE, DevSecOps, MLOps!

Related Posts

DeepMind open-sources Lab2D to support creation of 2D environments for AI and machine learning

A VR Film/Game with AI Characters Can Be Different Every Time You Watch or Play

Researchers detail LaND, AI that learns from autonomous vehicle disengagements

Google Teases Large Scale Reinforcement Learning Infrastructurean

Plan2Explore: Active Model-Building for Self-Supervised Visual Reinforcement Learning

Is AI an Existential Threat?