Google Brain’s DRL Helps Robots ‘Think While Moving’
When chasing a bouncing ball, a human will head where they anticipate the ball is going. If things change — for example a cat swats the ball and it bounces off in a new direction — the human will correct to an appropriate new route in real time.
Robots can have a hard time making such changes, as they tend to simply observe states, then calculate and execute actions, rather than thinking while moving.
Google Brain, UC Berkeley, and X Lab have proposed a concurrent Deep Reinforcement Learning (DRL) algorithm that enables robots to take a broader and more long-term view of tasks and behaviours, and decide on their next action before the current one is completed. The paper has been accepted by ICLR 2020.
Deep Reinforcement Learning (DRL) has achieved tremendous success in scenarios such as zero-sum games and robotic grasping. These achievements however were seen largely in blocking environments — where the model assumes there will be no change of state in the time between a state being observed and any action(s) being executed.
In the real world “concurrent environments,” however, the environmental states can evolve substantially in real time, and actions executed in a sequential blocking fashion can fail because the environment has changed since the agent initially computed the action.
The main idea of the proposed model is to enable a robot to act with concurrent control, “where sampling an action from the policy must be done concurrently with the time evolution.”
The researchers first used standard RL methods in both discrete-time and continuous-time settings. They then applied Markov Decision Processes (MDPs) with concurrent actions, where concurrent action environments capture the current state while a previous action is still being executed. The team concluded that MDP modifications are sufficient to represent concurrent actions.
The research team introduced value-based DRL algorithms that can cope with concurrent environments, and evaluated their methods on both a large-scale robotic grasping task simulation and a real-world robotic grasping task.
In the concurrent large-scale simulated robotic grasping task the proposed concurrent model acted 31.3 percent faster than the blocking execution baseline model. In the real-world robotic grasping task, the concurrent model was able to learn smoother trajectories that were 49 percent faster.