Upgrade & Secure Your Future with DevOps, SRE, DevSecOps, MLOps!

We spend hours on Instagram and YouTube and waste money on coffee and fast food, but won’t spend 30 minutes a day learning skills to boost our careers.
Master in DevOps, SRE, DevSecOps & MLOps!

Learn from Guru Rajesh Kumar and double your salary in just one year.

Get Started Now!

Researchers develop technique to increase sample efficiency in reinforcement learning

Source: venturebeat.com

In reinforcement learning, the goal generally is to spur an AI-driven agent to complete tasks via systems of rewards. This is achieved either by learning a mapping (a policy) from states to actions that maximize an expected return (policy gradients), or by inferring such a mapping by calculating the expected return for a given state-action pair.

Model-based reinforcement learning (MBRL) aims to improve this by learning a model of the dynamics from an agent’s interactions with the environment that can be leveraged across many different tasks (aka transferability) and used for planning. To this end, researchers at Google, the University of Oxford, and UC Berkeley developed an approach — Ready Policy One (a not-so-subtle nod to Ernest Cline’s hit novel Ready Player One) — to acquiring data for training world models through exploration that jointly optimizes policies for both reward and model uncertainty reduction. The end result is that the policies leveraged for data collection also perform well in the true environment and can be tapped for evaluation.

Ready Policy One takes an active learning approach rather than focusing on optimization. In other words, it seeks to directly learn the best model rather than learning the best policy. A tailored framework allows Ready Policy One to adapt the level of exploration to improve the model in the fewest number of samples, and a mechanism stops gathering new samples in any given collection phase when the incoming data resembles what’s already been acquired.

In a series of experiments, the researchers evaluated whether their active learning approach for MBRL was more sample-efficient than existing approaches. In particular, they tested it on a range of continuous control tasks from research firm OpenAI’s Gym environment, and they found that Ready Policy One could lead to “state-of-the-art” efficiency when combined with the latest model architectures.

“We are particularly excited by the many future directions from this work,” wrote the study’s coauthors. “Most obviously, since our method is orthogonal to other recent advances in MBRL, [Ready Policy One] could be combined with state of the art probabilistic architectures … In addition, we could take a hierarchical approach by ensuring our exploration policies maintain core behaviors but maximize entropy in some distant unexplored region. This would require behavioral representations, and some notion of distance in behavioral space, and may lead to increased sample efficiency as we could better target specific state-action pairs.”

Related Posts

DeepMind open-sources Lab2D to support creation of 2D environments for AI and machine learning

Source: computing.co.uk Alphabet subsidiary DeepMind announced on Monday that it has open-sourced Lab2D, a scalable environment simulator for artificial intelligence (AI) research that facilitates researcher-led experimentation with environment Read More

Read More

A VR Film/Game with AI Characters Can Be Different Every Time You Watch or Play

Source: technologyreview.com The square-faced, three-legged alien shoves and jostles to get at the enormous plant taking over its tiny planet. But each bite just makes the forbidden Read More

Read More

Researchers detail LaND, AI that learns from autonomous vehicle disengagements

Source: venturebeat.com UC Berkeley AI researchers say they’ve created AI for autonomous vehicles driving in unseen, real-world landscapes that outperforms leading methods for delivery robots driving on Read More

Read More

Google Teases Large Scale Reinforcement Learning Infrastructurean

Source: alyticsindiamag.com The current state-of-the-art reinforcement learning techniques require many iterations over many samples from the environment to learn a target task. For instance, the game Dota Read More

Read More

Plan2Explore: Active Model-Building for Self-Supervised Visual Reinforcement Learning

Source: bair.berkeley.edu To operate successfully in unstructured open-world environments, autonomous intelligent agents need to solve many different tasks and learn new tasks quickly. Reinforcement learning has enabled Read More

Read More

Is AI an Existential Threat?

Source: unite.ai When discussing Artificial Intelligence (AI), a common debate is whether AI is an existential threat. The answer requires understanding the technology behind Machine Learning (ML), and recognizing Read More

Read More
Subscribe
Notify of
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x