Researchers combine reinforcement learning and NLP to escape a Grue monster
AI researchers from Georgia Tech and Microsoft Research created AI that combines reinforcement learning and natural language processing (NLP) to outperform state-of-the-art question-answering AI in eight of nine text adventure games. Researchers say the model MC!Q*BERT is the first known learning agent to consistently get past a bottleneck where a player is eaten by a Grue monster in Zork, one of the first interactive computer games.
MC!Q*BERT is made in part from Q*BERT, a deep reinforcement learning agent that learns and builds a knowledge graph by asking questions about the world. Every observation made throughout the course of a game generates a series of questions that are then converted and added to the knowledge graph.
Q*BERT is based on KG-A2C, an approach to using reinforcement learning in NLP action spaces published earlier this year at ICLR by Georgia Tech PhD student Prithviraj Ammanabrolu.
For answering questions, Q*BERT uses a pretrained version of ALBERT, a variation of the BERT language model. The model is then fine-tuned using the SQuAD benchmark and a newly created data set of text adventure game question and answer pairs called Jericho-QA. Jericho-QA contains more than 200,000 question-answer pairings. The approach was detailed earlier this month in a paper published on preprint repository arXiv titled “How to Avoid Being Eaten by a Grue: Structured Exploration Strategies for Textual Worlds.”
“We present techniques for automatically detecting bottlenecks and efficiently learning policies that take advantage of the natural partitions in the state space,” the authors wrote in that paper. “We see text games as simplified analogues for systems capable of long-term dialogue with humans, such as in assistance with planning complex tasks, and also discrete planning domains such as logistics.”
A major challenge for making AI that can succeed in text adventure games is overcoming bottlenecks, or instances where players are commonly trapped and eliminated. In Zork, for example, a common bottleneck occurs when players moving about without a light are eaten by a Grue monster. That means the AI must recognize and fulfill a certain series of actions to advance. Authors said many existing models fail to clear such bottlenecks. However, they assert, Q*BERT automatically detects bottlenecks, then creates policies to overcome the challenge. A dependency graph takes into account the items Q*BERT must collect to succeed and the locations in the game it must visit in order to advance.
All experiments took place within the Jericho simulator created by Microsoft. If an agent failed to collect a reward in the simulated environment, authors understood this to mean that it may be stuck because of a bottleneck. Once identified, the agent uses a method called modularity chaining to “backtrack to previously visited states” and overcome bottlenecks.
In other recent question-answering NLP news, last week Google AI together with partners from University of Washington and Princeton University announced the launch of the EfficientQA competition, a question-answering challenge for creating NLP capable of storing knowledge. Top-performing models will compete live against human trivia experts.