reinforcement learnin Archives - Artificial Intelligence

Reinforcement Learning: The Next Big Thing For AI (Artificial Intelligence)?

aiuniverse — Thu, 02 Jul 2020 06:43:00 +0000

Source: forbes.com

When it comes to AI, much of the attention has been on deep learning. And for good reason. This part of the AI world has seen great strides, such as with image recognition.

But of course, there are other areas of AI that look promising, such as reinforcement learning. Keep in mind that cutting-edge companies like Google’s DeepMind and OpenAI have already made breakthroughs with this approach.

So what is reinforcement learning? Well, interesting enough, it is not new. “Reinforcement learning is a classic behavioral phenomenon, known in the psychology literature since the early 1950s,” said Dr. Matt Johnson, who is a professor of psychology at Hult International Business School and the author of Blindsight: The (Mostly) Hidden Ways Marketing Reshapes Our Brains. “In its simplest form, it states that the frequency of a behavior will go up or down depending on the direct consequences of that behavior. This is true of animal behavior as well as human behavior.”

But some of the key principles of reinforcement learning have been applied to AI models. This is often referred to as deep reinforcement learning (since it is leveraged with deep learning).

“Reinforcement learning entails an agent, action and reward,” said Ankur Taly, who is the head of data science at Fiddler. “The agent, such as a robot or character, interacts with its surrounding environment and observes a specific activity, responding accordingly to produce a beneficial or desired result. Reinforcement learning adheres to a specific methodology and determines the best means to obtain the best result. It’s very similar to the structure of how we play a video game, in which the agent engages in a series of trials to obtain the highest score or reward. Over several iterations, it learns to maximize its cumulative reward.”

In fact, some of the most interesting use cases for reinforcement learning have been with complex games. Consider the case of DeepMind’s AlphaGo. The system used reinforcement learning to quickly understand how to play Go and was able to beat the world champion, Lee Sedol, in 2016 (the game has more potential moves than the number of atoms in the universe!)

But there have certainly been other applications of the technology that go beyond gaming. To this end, reinforcement learning has been particularly useful with robotics. For example, OpenAI has used this technique for a robotic arm that was able to solve the Rubik’s cube.

Reinforcement learning has even been shown to be effective when finding better solutions for tax policies and equality, as seen with Saleforce.com’s AI Economist. “We believe a reinforcement learning framework is well-suited for uncovering insights on how the behavior of economic agents could be influenced by pulling different policy ‘levers,’” said Richard Socher, who is the Chief Scientist at Salesforce. “This is one of many scenarios where we believe reinforcement learning can be utilized in the future.”

Here are some other areas where reinforcement learning can make an impact:

Entertainment: “The future consists of free-form environments that the next generation of ‘movie-goers’ and gamers are looking for,” said Yuheng Chen, who is the COO of rct studio. “AI-powered characters will co-adapt to produce elaborate storylines, and consumers will no longer be confined to fixed dialogues and rigid interaction between non-player characters.”
Healthcare: “Imagine trying to use reinforcement learning to teach an AI doctor how to treat a medical patient,” said Noah Giansiracusa, who is an Assistant Professor of Mathematical Sciences at Bentley University. “The AI doctor might try medications almost randomly to see what effect they have and over time should learn the patterns and develop an understanding of which medications work best in which situations. But we obviously can’t let the AI doctor perform these experiments on real patients and physiology is far too complicated to build a suitable computer simulation of the human body to experiment on virtually. However, with vast troves of medical data, when the AI doctor wants to try a certain medication on a certain patient, we can look through the data and find an actual historic patient who had similar symptoms and vitals as the current patient, and even find such a patient who was then given the medication in question—-thus the AI doctor is not actually performing new experiments to learn, it is suggesting experiments to try then looking back at past data to see what typically happened when that action was taken.”

Now reinforcement learning is still in the nascent phases. But given the advances so far, this approach to AI is likely to get more important. “I believe reinforcement learning is on the cusp of rippling through and disrupting a lot of industries,” said Giansiracusa.

The post Reinforcement Learning: The Next Big Thing For AI (Artificial Intelligence)? appeared first on Artificial Intelligence.

DeepMind trains robots to insert USB keys and stack colored blocks

aiuniverse — Tue, 11 Feb 2020 06:56:43 +0000

Source: venturebeat.com

Robots perform better at a range of tasks when they draw on a growing body of experience. That’s the assertion of a team of researchers hailing from DeepMind, who in a preprint paper propose a technique called reward sketching. They claim it’s an effective way of eliciting human preferences to learn a reward function — a function describing how an AI agent should behave — that can be used to retrospectively annotate all historical data, collected for different tasks with predicted rewards for the new task. This annotated data set can then be used to learn manipulation policies, the team says, or probability distributions over actions given certain states, with reinforcement learning from visual input without interaction with a real robot.

The work builds on a DeepMind study published in January 2020, which described a technique — continuous-discrete hybrid learning — that optimizes for discrete and continuous actions simultaneously, treating hybrid problems in their native form. As something of a precursor to that paper, in October 2019, the Alphabet subsidiary demonstrated a novel way of transferring skills from simulation to a physical robot.

“[Our] approach makes it possible to scale up RL in robotics, as we no longer need to run the robot for each step of learning. We show that the trained batch [reinforcement learning] agents, when deployed in real robots, can perform a variety of challenging tasks involving multiple interactions among rigid or deformable objects,” wrote the coauthors of this latest paper. “Moreover, they display a significant degree of robustness and generalization. In some cases, they even outperform human teleoperators.”

As the team explains, at the heart of reward sketching are three key ideas: efficient elicitation of user preferences to learn reward functions, automatic annotation of all historical data with learned reward functions, and harnessing the data sets to learn policies from stored data via reinforcement learning.

For instance, a human teleoperates a robot with a six-degree-of-freedom mouse and a gripper button or a handheld virtual reality controller to provide first-person demonstrations of a target task. To specify a new target task, the operator controls the robot to provide several successful (and optionally unsuccessful) examples of completing the task, and these demonstrations help to bootstrap the reward learning by providing examples of successful behavior with high rewards.

In the researchers’ proposed approach, all robot experience — including demonstrations, teleoperated trajectories, human play data, and experience from the execution of either scripted or learned policies — is accumulated into what’s called NeverEnding Storage (NES). A metadata system implemented as a relational database ensures it’s appropriately annotated and queried; it attaches environment and policy metadata to every trajectory, as well as arbitrary human-readable labels and reward sketches.

In the reward-sketching phase, humans annotate a subset of episodes from NES (including task-specific demos) with annotations of reward, using a technique that allows a single person to produce hundreds of annotations per minute. These annotations feed into a reward model that’s then used to predict reward values for all experience in NES, so that all historical data in a training policy for a new task can be leveraged without requiring manual annotation of the whole repository.

An agent is trained with 75% of the batch drawn from the entirety of NES and 25% from the data specific to the target task. Then, it’s deployed to a robot, which enables the collection of more experience to be used for reward sketching or reinforcement learning.

In experiments, the DeepMind team used a Sawyer robot with a gripper and a wrist force-torque sensor. Observations were provided by three cameras around a cage, as well as two wide-angle cameras and one depth camera mounted at the wrist and proprioceptive sensors in the arm. In total, the team collected over 400 hours of multiple-camera videos of proprioception — i.e., perception or awareness of position and movement) — and actions from behavior generated by human teleoperators, as well as random, scripted, and policies.

The researchers trained multiple reinforcement learning agents in parallel for 400,000 steps and evaluated the most promising on the real-world robot. Tasked with lifting and stacking rectangular objects, the Sawyer successfully lifted 80% of the time and stacked 60% of the time, and 80% and 40% of the time when those objects were positioned in “adversarial” ways. Perhaps more impressively, in a separate task involving the precise insertion of a USB key into a computer port, the agent — when provided reward sketches from over 100 demonstrators — reached over 80% success rate within 8 hours.

“The multi-component system allows a robot to solve a variety of challenging tasks that require skillful manipulation, involve multi-object interaction, and consist of many time steps,” wrote the researchers. “There is no need to worry about wear and tear, limits of real time processing, and many of the other challenges associated with operating real robots. Moreover, researchers are empowered to train policies using their batch [reinforcement learning] algorithm of choice.”

They leave to future work identifying ways to minimize human-in-the-loop training, and to minimize the agents’ sensitivity to “significant perturbations” in the setup.

The post DeepMind trains robots to insert USB keys and stack colored blocks appeared first on Artificial Intelligence.