Reinforcement Archives - Artificial Intelligence

Amazon Dives Deep into Reinforcement Learning

aiuniverse — Sat, 15 Jun 2019 09:49:32 +0000

Source:- forbes.com

Machine learning is one of the cornerstones of artificial intelligence. If systems can’t learn, they can’t adapt or apply knowledge from one domain to another. And yet, machine learning is just a component of the overall goals of AI. Beyond machine learning, we still need ways to interact with the external environment, reason about things that are learned, and set goals that drive higher-order aspects of intelligence. Regardless of machine learning’s role in the overall context of AI, researchers and enterprises still find themselves grappling with the challenges of making ML work across the broad range of scenarios and applications they are looking to apply it to.

There are three main types of machine learning approaches: supervised learning, unsupervised learning, and reinforcement learning. In the supervised learning approach, machines learn by example, being fed lots of good quality examples to learn from (“training”) and then creating a model of that learning that it can generalize to similar tasks. In the unsupervised learning approach, machines automatically find patterns and groupings of data that is meaningful, and develop a model through discovery of those patterns. The reinforcement learning method takes a different approach. Rather than being given good examples it or discovering patterns on its own, reinforcement learning (RL) systems are given a final goal and learn through trial and error, discovering the optimal solution or best path to a goal.

Reinforcement learning has generally been applied to solve games and puzzles. From early AI applications in checkers and chess to more recent RL-based solutions that have learned how to play some of the most difficult games such as Go, DOTA, and multi-player games, RL has shown that it can offer significant strength in solving some of the more difficult challenges tasked to AI. Despite the possibilities, RL approaches are not as widely implemented in the enterprise as supervised or unsupervised learning approaches. This is because companies find the more task-oriented supervised learning approaches suitable to the recognition, conversation, and predictive analytics patterns of AI, and data-oriented unsupervised learning approaches applicable to the pattern and anomaly discovery and hyperpersonalization patterns. This leaves the goal-oriented RL approaches suitable for autonomous systems and goal-driven solutions patterns.

Despite RL’s lower enterprise profile, it has a high profile in news and media. DeepMind, acquired by Google in 2015 has been making waves with its approach to QLearning, using the RL method to beat top players at many competitive games. They see RL as a gateway to solving many general problems, and indeed, they see RL as possibly the algorithmic approach to solving the Artificial General Intelligence (AGI) challenge of a generally applicable machine learning method. While this remains to be seen, it has certainly given people much to think about, with personalities like Elon Musk, Max Tegmark, and others warning about the imminent possibility of the superintelligence.

While the fears of an imminent machine takeover is most likely overwrought, in reality, RL is being applied to much more mundane real-world activities such as resource optimization, planning, navigation, and scenario simulation approaches.

Recently Amazon has been making significant waves of their own in the RL space. At the AWS Re:Invent 2018conference in Las Vegas last year, Amazon unveiled the DeepRacer RL platform and a league pitting individual skills to develop RL algorithms that can optimize the path of the autonomous vehicle through a controlled course. While this might seem to be a trivial application, Amazon has been at the forefront of applying RL to their own real world scenarios.

The company applies RL in combination with other ML methods to optimize its warehouse and logistics operations, and assisting with automation in its various fulfillment facilities. The company has also applied RL to solving supply chain optimization problems and helping to discover optimal paths for delivery.

More interestingly, the company has applied RL and other ML approaches to help create the latest iteration of its autonomous drone delivery device. Unveiled at Amazon’s Re:MARS 2019 conference, Amazon will soon be shipping packages via airborne drones to houses around the world, pending regulatory approvals and technological advancements. While RL might be used in controlling flight operations and path discovery, what is more compelling is that the company has used those methods to design the drone itself. According to company sources, Amazon used machine learning to iterate and simulate over 50,000 configurations of drone design before choosing the optimal approach.

The newly designed drone can fly vertically and hover like a quadcopter drone, but can fly in a forward position much like an airplane. The system is outfitted with many multiple and redundant sensors and cameras to observe its surroundings and handle complicated navigation around obstructions. The company demonstrated a video in which the drone was able to detect a clothesline that traversed the landing zone, something that has been particularly challenging for other autonomous drone and computer vision applications.

Amazon is making it clear that it believes that RL should be a first-class participant in the ML portfolio considered by enterprises. And there’s no reason not to believe that RL shouldn’t be as equally and widely implemented as supervised and unsupervised forms of machine learning. Data scientists, machine learning developers, and even casual AI developers need to simply have more experience using and applying RL to the range of problems they are seeing. Once they get confident in those applications, there’s no doubt that we’ll start to see RL being implemented more widely in enterprise and other environments. RL is not just for games anymore.

The post Amazon Dives Deep into Reinforcement Learning appeared first on Artificial Intelligence.

Google open-sources soccer reinforcement learning simulator

aiuniverse — Sat, 08 Jun 2019 10:38:19 +0000

Source:- venturebeat.com

About a dozen members of the Google Brain team today open-sourced Google Research Football Environment, a 3D reinforcement learning simulator for training AI to master soccer. The environment can simulate soccer matches, including particular scenarios like corner and penalty kicks, goals, and offsides. The news comes today at the start of the Women’s World Cup starts in France and a day after Google introduced pricing and games for its Stadia cloud gaming service.

“Researchers can directly experience how the game works by playing against each other or by dueling their agents. The game can be controlled by means of both keyboards and gamepads. Moreover, replays of several rendering qualities can be automatically stored while training, so that it is easy to inspect the policies agents are learning,” researchers said in a paper that accompanies the package on GitHub.

A beta version of the Google Research Football Environment is available on GitHub and includes a C++ game engine based on Gameplay Football, a simulator open-sourced in 2017.

The environment also includes state-of-the-art reinforcement learning algorithms proximal policy optimization (PPO), DQN, and Impala, as well as a set of about a dozen different scenarios for training AI agents in what the researchers call the Football Academy.

This practice environment for particular scenarios includes corner kicks, 3-on-1 matches, and 11-on-11 matches with lazy opponents. In initial results detailed in the research paper, Impala trained on 500 million steps saw the best performance.

The 3D simulator can take into account both the location of a player on the pitch and raw pixel analysis to find the best way to pass the ball, overcome obstacles, defend forwards, and score goals.

Reinforcement learning through simulations has been applied to accomplish a number of challenging gaming tasks like training agents to beat humans in Starcraft, Quake III, Go, and Pong, but it’s also being used for a range of jobs from robotic arm and leg control to online recommendation tools.

The Google Research Football Environment works with OpenAI’s Gym reinforcement learning environment.

The post Google open-sources soccer reinforcement learning simulator appeared first on Artificial Intelligence.

Reinforcement learning explained

aiuniverse — Fri, 07 Jun 2019 07:07:31 +0000

Source:- itworld.com

Reinforcement learning uses rewards and penalties to teach computers how to play games and robots how to perform tasks independently

You have probably heard about Google DeepMind’s AlphaGo program, which attracted significant news coverage when it beat a 2-dan professional Go player in 2015. Later, improved evolutions of AlphaGo went on to beat a 9-dan (the highest rank) professional Go player in 2016, and the #1-ranked Go player in the world in May 2017. A new generation of the software, AlphaZero, was significantly stronger than AlphaGo in late 2017, and not only learned Go but also chess and shogi (Japanese chess).

AlphaGo and AlphaZero both rely on reinforcement learning to train. They also use deep neural networks as part of the reinforcement learning network, to predict outcome probabilities.

In this article, I’ll explain a little about reinforcement learning, how it has been used, and how it works at a high level. I won’t dig into the math, or Markov Decision Processes, or the gory details of the algorithms used. Then I’ll get back to AlphaGo and AlphaZero.

What is reinforcement learning?

There are three kinds of machine learning: unsupervised learning, supervised learning, and reinforcement learning. Each of these is good at solving a different set of problems.

Unsupervised learning, which works on a complete data set without labels, is good at uncovering structures in the data. It is used for clustering, dimensionality reduction, feature learning, and density estimation, among other tasks.

Supervised learning, which works on a complete labeled data set, is good at creating classification models for discrete data and regression models for continuous data. The machine learning or neural network model produced by supervised learning is usually used for prediction, for example to answer “What is the probability that this borrower will default on his loan?” or “How many widgets should we stock next month?”

Reinforcement learning trains an actor or agent to respond to an environment in a way that maximizes some value. That’s easier to understand in more concrete terms.

For example, AlphaGo, in order to learn to play (the action) the game of Go (the environment), first learned to mimic human Go players from a large data set of historical games (apprentice learning). It then improved its play through trial and error (reinforcement learning), by playing large numbers of Go games against independent instances of itself.

Note that AlphaGo doesn’t try to maximize the size of the win, like dan (black belt)-level human players usually do. It also doesn’t try to optimize the immediate position, like a novice human player would. AlphaGo maximizes the estimated probability of an eventual win to determine its next move. It doesn’t care whether it wins by one stone or 50 stones.

Reinforcement learning applications

Learning to play board games such as Go, shogi, and chess is not the only area where reinforcement learning has been applied. Two other areas are playing video games and teaching robots to perform tasks independently.

In 2013, DeepMind published a paper about learning control policies directly from high-dimensional sensory input using reinforcement learning. The applications were seven Atari 2600 games from the Arcade Learning Environment. A convolutional neural network, trained with a variant of Q-learning (one common method for reinforcement learning training), outperformed all previous approaches on six of the games and surpassed a human expert on three of them.

The convolutional neural network’s input was raw pixels and its output was a value function estimating future rewards. The convolutional-neural-network-based value function worked better than more common linear value functions. The choice of a convolutional neural network when the input is an image is unsurprising, as convolutional neural networks were designed to mimic the visual cortex.

DeepMind has since expanded this line of research to the real-time strategy game StarCraft II. The AlphaStar program learned StarCraft II by playing against itself to the point where it could almost always beat top players, at least for Protoss versus Protoss games. (Protoss is one of the alien races in StarCraft.)

Robotic control is another problem that has been attacked with deep reinforcement learning methods, meaning reinforcement learning plus deep neural networks, with the deep neural networks often being convolutional neural networks trained to extract features from video frames. Training with real robots is time-consuming, however. To reduce training time, many of the studies start off with simulations before trying out their algorithms on physical drones, robot dogs, humanoid robots, or robotic arms.

How reinforcement learning works

We’ve already discussed that reinforcement learning involves an agent interacting with an environment. The environment may have many state variables. The agent performs actions according to a policy, which may change the state of the environment. The environment or the training algorithm can send the agent rewards or penalties to implement the reinforcement. These may modify the policy, which constitutes learning.

For background, this is the scenario explored in the early 1950s by Richard Bellman, who developed dynamic programming to solve optimal control and Markov decision process problems. Dynamic programming is at the heart of many important algorithms for a variety of applications, and the Bellman equation is very much part of reinforcement learning.

A reward signifies what is good immediately. A value, on the other hand, specifies what is good in the long run. In general, the value of a state is the expected sum of future rewards. Action choices—policies—need to be computed on the basis of long-term values, not immediate rewards.

Effective policies for reinforcement learning need to balance greed or exploitation—going for the action that the current policy thinks will have the highest value—against exploration, randomly driven actions that may help improve the policy. There are many algorithms to control this, some using exploration a small fraction of the time ε, and some starting with pure exploration and slowly converging to nearly pure greed as the learned policy becomes strong.

There are many algorithms for reinforcement learning, both model-based (e.g. dynamic programming) and model-free (e.g. Monte Carlo). Model-free methods tend to be more useful for actual reinforcement learning, because they are learning from experience, and exact models tend to be hard to create.

If you want to get into the weeds with reinforcement learning algorithms and theory, and you are comfortable with Markov decision processes, I’d recommend Reinforcement Learning: An Introduction by Richard S. Sutton and Andrew G. Barto. You want the 2^nd edition, revised in 2018.

AlphaGo and AlphaZero

I mentioned earlier that AlphaGo started learning Go by training against a database of human Go games. That bootstrap got its deep-neural-network-based value function working at a reasonable strength.

For the next step in AlphaGo’s training, it played against itself—a lot—and used the game results to update the weights in its value and policy networks. That made the strength of the program rise above most human Go players.

At each move while playing a game, AlphaGo applies its value function to every legal move at that position, to rank them in terms of probability of leading to a win. Then it runs a Monte Carlo tree search algorithm from the board positions resulting from the highest-value moves, picking the move most likely to win based on those look-ahead searches. It uses the win probabilities to weight the amount of attention it gives to searching each move tree.

The later AlphaGo Zero and AlphaZero programs skipped training against the database of human games. They started with no baggage except for the rules of the game and reinforcement learning. At the beginning they played random moves, but after learning from millions of games against themselves they played very well indeed. AlphaGo Zero surpassed the strength of AlphaGo Lee in three days by winning 100 games to 0, reached the level of AlphaGo Master in 21 days, and exceeded all the old versions in 40 days.

AlphaZero, as I mentioned earlier, was generalized from AlphaGo Zero to learn chess and shogi as well as Go. According to DeepMind, the amount of reinforcement learning training the AlphaZero neural network needsdepends on the style and complexity of the game, taking roughly nine hours for chess, 12 hours for shogi, and 13 days for Go, running on multiple TPUs. In chess, AlphaZero’s guidance is much better than conventional chess-playing programs, reducing the tree space it needs to search. AlphaZero only needs to evaluate 10,000’s of moves per decision versus 10,000,000’s of moves per decision for Stockfish, the strongest handcrafted chess engine.

These board games are not easy to master, and AlphaZero’s success says a lot about the power of reinforcement learning, neural network value and policy functions, and guided Monte Carlo tree search. It also says a lot about the skill of the researchers, and the power of TPUs.

Robotic control is a harder AI problem than playing board games or video games. As soon as you have to deal with the physical world, unexpected things happen. Nevertheless, there has been progress on this at a demonstration level, and the most powerful approaches currently seem to involve reinforcement learning and deep neural networks.

The post Reinforcement learning explained appeared first on Artificial Intelligence.

Deep learning techniques teach neural model to ‘play’ retrosynthesis

aiuniverse — Thu, 06 Jun 2019 04:37:25 +0000

Source:- phys.org

Researchers, from biochemists to material scientists, have long relied on the rich variety of organic molecules to solve pressing challenges. Some molecules may be useful in treating diseases, others for lighting our digital displays, still others for pigments, paints, and plastics. The unique properties of each molecule are determined by its structure—that is, by the connectivity of its constituent atoms. Once a promising structure is identified, there remains the difficult task of making the targeted molecule through a sequence of chemical reactions. But which ones?

Organic chemists generally work backwards from the target molecule to the starting materials using a process called retrosynthetic analysis. During this process, the chemist faces a series of complex and inter-related decisions. For instance, of the tens of thousands of different chemical reactions, which one should you choose to create the target molecule? Once that decision is made, you may find yourself with multiple reactant molecules needed for the reaction. If these molecules are not available to purchase, then how do you select the appropriate reactions to produce them? Intelligently choosing what to do at each step of this process is critical in navigating the huge number of possible paths.

Researchers at Columbia Engineering have developed a new technique based on reinforcement learning that trains a neural network model to correctly select the “best” reaction at each step of the retrosynthetic process. This form of AI provides a framework for researchers to design chemical syntheses that optimize user specified objectives such synthesis cost, safety, and sustainability. The new approach, published May 31 by ACS Central Science, is more successful (by ~60%) than existing strategies for solving this challenging search problem.

“Reinforcement learning has created computer players that are much better than humans at playing complex video games. Perhaps retrosynthesis is no different! This study gives us hope that reinforcement-learning algorithms will be perhaps one day better than human players at the ‘game’ of retrosynthesis,” says Alán Aspuru-Guzik, professor of chemistry and computer science at the University of Toronto, who was not involved with the study.

The team framed the challenge of retrosynthetic planning as a game like chess and Go, where the combinatorial number of possible choices is astronomical and the value of each choice uncertain until the synthesis plan is completed and its cost evaluated. Unlike earlier studies that used heuristic scoring functions—simple rules of thumb—to guide retrosynthetic planning, this new study used reinforcement learning techniques to make judgments based on the neural model’s own experience.

“We’re the first to apply reinforcement learning to the problem of retrosynthetic analysis,” says Kyle Bishop, associate professor of chemical engineering. “Starting from a state of complete ignorance, where the model knows absolutely nothing about strategy and applies reactions randomly, the model can practice and practice until it finds a strategy that outperforms a human-defined heuristic.”

In their study, Bishop’s team focused on using the number of reaction steps as the measurement of what makes a “good” synthetic pathway. They had their reinforcement learning model tailor its strategy with this goal in mind. Using simulated experience, the team trained the model’s neural network to estimate the expected synthesis cost or value of any given molecule based on a representation of its molecular structure.

The team plans to explore different goals in the future, for instance, training the model to minimize costs rather than the number of reactions, or to avoid molecules that could be toxic. The researchers are also trying to reduce the number of simulations required for the model to learn its strategy, as the training process was quite computationally expensive.

“We expect that our retrosynthesis game will soon follow the way of chess and Go, in which self-taught algorithms consistently outperform human experts,” Bishop notes. “And we welcome competition. As with chess-playing computer programs, competition is the engine for improvements in the state-of-the-art, and we hope that others can build on our work to demonstrate even better performance.”

The post Deep learning techniques teach neural model to ‘play’ retrosynthesis appeared first on Artificial Intelligence.

Using Microlearning and Reinforcement Techniques for Sales Enablement Success

aiuniverse — Thu, 21 Sep 2017 07:31:51 +0000

Source – td.org

Sales professionals have undoubtedly sat through many annual sales kick-off meetings, product launch training sessions, and classroom lectures wondering what they were trying to achieve. However, managers hope these events will impart the knowledge and skills of top-performing sales reps to less-experienced peers. They’re also trying to hone important skills like storytelling, customizing a sales message, and presentation skills.

Knowledge is a salesperson’s currency, and sales training and enablement practices exist to help reps become better at the bottom line: selling. However, a not-so-secret reality of this process is that traditional sales training methods are no longer effective on their own.

Consider a few statistics: About 55 percent of salespeople lack basic sales skills. In addition, approximately 50 percent of content learned is forgotten within five weeks of a training event, and 84 percent within 90 days. Yet U.S. businesses continue to spend a whopping $160 billion annually on employee learning and training.

Given the time and money companies invest, it’s not a question of increasing effort or expense. Rather, the fundamental problem is the way training is often delivered: intense yet infrequent bursts, such as with annual sales kick-off meeting or through lengthy, one-size-fits-all e-learning courses provided through corporate learning management systems.

So how can we make learning more efficient and effective so that knowledge is actually retained over time?

Microlearning for Macro-Effectiveness

Not only are today’s sales reps forgetful, they’re busy as well. In fact, research shows that, on average, employees are interrupted every three minutes. Sitting through lengthy online training courses or getting pulled out of the field to attend kick-off events are not an efficient use of sales reps’ valuable time. What they do have time for is bite-sized pieces of information that they can consume throughout their day to help them to stay up-to-date and retain knowledge.

This type of information sharing is known as microlearning, and technology today is uniquely poised to enable it. Microlearning describes any learning model that operates on the principle that people learn more effectively if content is broken into smaller units and delivered in short sessions. Mobile devices give sales organizations the power to help reps practice in short bursts whenever (and wherever) they have a few minutes of free time.

And science backs it up, too. Research shows that microlearning works because presenting information in small chunks reduces cognitive load and eases the perceived burden of learning. Giving reps the option to review information on their own schedule (and in accordance with their own attention span) makes it easier for them to engage throughout a busy week.

Combining Microlearning With Reinforcement Learning

Microlearning is most effective when combined with spaced repetition and reinforcement learning—techniques that involve the periodic reinforcement of new learning. This is because baseline training courses only put new information into a learner’s short-term memory because the topics are only visited once or twice. A conscious effort is still needed to retain that information, though, because a person’s ability to recall information disappears over time unless the material is revisited.

Because we get hit with all sorts of stimuli throughout the day, the brain has evolved to determine which information is most important by registering how often it’s presented. So the more we revisit a given concept, the more we strengthen the information pathways (synapses) in the brain for future recall.

Microlearning and reinforcement learning techniques give busy people a way to actually take advantage of this in the real world, using ongoing exercises, coaching, quizzing and drilling. Reinforcement learning works best when new information is reintroduced within 24 hours, and again in the subsequent days and weeks, with gradually increasing time intervals added between review sessions over the ensuing time period. In fact, studies show a 30 to 55 percent improvement in knowledge recall when using spaced repetition.

Many successful sales organizations are using reinforcement learning and spaced repetition techniques, such as quizzes and flashcards. They also break down content from sales kick-off and product training events into smaller, bite-sized videos that can be watched at regular intervals in order to more effectively deliver this type of learning. Other best practices include gamification and the sharing of peer-generated content that can be consumed in bite-sized pieces at regular intervals.

Using these techniques moves information from the brain’s prefrontal cortex to the high-capacity, long-term memory of the hippocampus, where increasingly less effort and time is needed to activate it for later retrieval. Since salespeople can’t lean on reference material to fill knowledge gaps when questions arise in the field, they have to know it all cold. Having knowledge down pat also helps reps to present in a more natural way, so they don’t have to make a huge recall effort in front of the customer.

In Defense of Sales Training

Despite the effectiveness of microlearning and reinforcement techniques, you can’t learn everything you need to successfully sell in bite-sized chunks. Honing innate skills, such as storytelling and the best way to deliver a compelling message as part of a sales presentation, still benefits from a face-to-face interaction.

For modern-day sales organizations, video is proving to be a powerful tool to help busy sales reps get this much-needed face time with colleagues. Not only does the brain processes visuals 60,000 times faster than text, which allows more information to be delivered in smaller chunks, employees are 75 percent more likely to watch a video than to read documents, emails, or web articles. Using video, trainers and managers bypass the need for in-person meetings and ride-alongs to facilitate interactive learning for their reps that can be further strengthened through reinforcement after the fact.

While the sales kick-off meeting isn’t going away anytime soon, the skills learned during in-person training can be augmented by microlearning and reinforcement learning techniques. Salespeople battle tight schedules and constant travel, so learning content needs to be accessible on-demand wherever they’re located. Microlearning is a commonsense approach to making sure your reps always deliver when a deal is on the line.

The post Using Microlearning and Reinforcement Techniques for Sales Enablement Success appeared first on Artificial Intelligence.