HOW TO IMPROVE GENERALISATION IN DEEP REINFORCEMENT LEARNING?
A team at NYU and Modl.ai have posited in their recent work, that simple image processing techniques (listed below) can improve the generalisation in deep reinforcement learning systems.
RL systems are typically trained on gaming platforms which are test beds for teaching agents new tasks through visual cues. By exploiting the field of views of the agents on these platforms, the researchers believe that they can enable better generalisation. They have predominantly used the following techniques:
- Cropping shrinks observations down to just local information around the agent.
- Rotation keeps the agents always facing forward, so any action they take always happens from the same perspective.
- Translation then orients the observations around the agent so it is always at the center of its view.
The researchers have conducted their experiments on the technique on two game variants — a GVGAI port for the dungeon system in The Legend of Zelda and a simplified version of the game, Simple Zelda.
Why Models Fail To Generalise
Although deep reinforcement learning agents can solve complex tasks, they struggle to transfer their experience to new environments. According to the findings by the researchers at OpenAI, generalizing between tasks remains difficult for state of the art deep reinforcement learning (RL) algorithms.
However, one thing to be noted is “generalization” is used in a completely different way in reinforcement learning context compared to, say a supervised learning setup.
In Supervised Learning, generalization usually refers to generalization performance on a test set that is taken from the same (or otherwise very similar) data distribution. Whereas, RL generalization typically refers to transfer learning between tasks.
The failure of generalization, observed the researchers at NYU, extended to various environments with static third-person perspective.
A Simple Way To Generalisation
To begin with, the authors in their paper, consider the hypothesis that generalisation in deep RL is almost not possible with the existing methods. Learning generalizable policies, posit the authors, is more effective when the games that are used for training have an agent-centric view.
For their experiments, they made use of well known image processing techniques: Translation, Rotation and Cropping.
Think of translation as the ‘re-centre’ option on Google Maps. In the video games that were used for the experiment, the authors tried to use translation techniques to move the agent towards the target. Being relative closer to the target is also an indication of significant learning leaps in the agent. As shown in the above picture, the agent (pink square) is centered on the right side.
Rotation helps the agent to learn navigation as it simplifies the task. For example: if you want to reach for something on the right, the agent just rotates until that object is above and then move up. If that object becomes to the left, the same strategy can be applied (rotate then move up). This is not the case without rotation where we need to move to a different direction depending on the location of the target object.
Neural networks, wrote the authors, can be considered to be behaving as Locality-sensitive hashing (LSH) functions but put them in environments with many combinatorial arrangements of agent and object locations, things get cumbersome. As intuitive as it sounds, cropping the visual field, allows the agent to focus on what is more local to it and help the deep reinforcement learning system learn better.
However, cropping also comes with a disadvantage as there will be some missing information and lack of a global context at the agent’s end.
To conclude, the authors say that RL agents are better off learning first hand than from a third player perspective unlike in humans where detaching oneself from the immediate surrounding and viewing from a third person perspective gives more insights.
The authors have also, in their paper, admit that their hypothesis runs counter to received wisdom and implicit assumptions in the mainstream of deep reinforcement learning research.
They insist that deep reinforcement learning generalisation on games with static third-person representations, may or may not work because the network structures cannot learn the types of input transformations that are necessary for generalizable policies.
In any case, they also imply that input representation plays a much larger role than is commonly assumed.
According to the researchers at OpenAI, deep RL systems can be improved if more research is done in the following areas:
- Investigate the relationship between environment complexity and the number of levels required for good generalization
- Investigate whether different recurrent architectures are better suited for generalization in these environments
- Explore ways to effectively combine different regularization methods.