DeepMind Archives - Artificial Intelligence

DeepMind open-sources Lab2D to support creation of 2D environments for AI and machine learning

aiuniverse — Wed, 18 Nov 2020 05:28:19 +0000

Source: computing.co.uk

Alphabet subsidiary DeepMind announced on Monday that it has open-sourced Lab2D, a scalable environment simulator for artificial intelligence (AI) research that facilitates researcher-led experimentation with environment design.

DeepMind describes Lab2D as a system designed to support creation of two-dimensional (2D) layered, discrete “grid-world” environments, in which pieces move around in the same way as chess pieces move around on a chess board.

The system is particularly tailored for multi-agent reinforcement learning, according to Lab2D researchers.

The computationally intensive engine for Lab2D is written in C++ for efficiency, while most of the level-specific logic is written in Lua.

“The environments are ‘grid worlds’, which are defined with a combination of simple text-based maps for the layout of the world, and Lua code for its behaviour,” the researchers state in their study paper.

“Machine learning agents interact with these environments through one of two APIs, the Python dm_env API or a custom C API (which is also used by DeepMind Lab).”

The researchers note that in the rush to create artificial general intelligence which will work in any environment, ‘tinkering’ with environmental variables has become unfashionable. Nevertheless, in real-world use cases simulated environments are essential to discover how systems based on reinforcement learning develop an understanding of the conditions in which they operate.

They make the case that 2D environments are inherently easier to understand than three-dimensional ones, at very little, if any, loss of expressiveness, and are more performant and easier to use.

“Rich complexity along numerous dimensions can be studied in 2D just as readily as in 3D, if not more so.”

They note that 2D worlds have been successfully used to study problems as diverse as navigation, social complexity, imperfect information, and abstract reasoning.

“2D worlds can often capture the relevant complexity of the problem at hand without the need for continuous-time physical environments.”

Another advantage of 2D worlds is that they are easier to program and design than their 3D counterparts. This has been particularly noticed when the 3D world actually exploits the space or physical dynamics beyond the capabilities of 2D ones.

Moreover, 2D worlds do not need complex 3D assets to be evocative, or any reasoning about lighting, shaders, and projections.

The decision to open-source Lab2D comes after DeepMind released OpenSpiel, a reinforcement learning framework for video games, designed to “promote general multi-agent reinforcement learning across many different game types, in a similar way as general game-playing but with a heavy emphasis on learning and not in competition form.”

Lab2D seeks to build on this work by providing a means to study how agents learn.

“We think that progress toward artificial general intelligence requires robust simulation platforms to enable in silico exploration of agent learning, skill acquisition, and careful measurement. We hope that the system we introduce here, DeepMind Lab2D, can fill this role.”

The post DeepMind open-sources Lab2D to support creation of 2D environments for AI and machine learning appeared first on Artificial Intelligence.

HOW DEEPMIND ALGORITHMS HELPED IMPROVE THE ACCURACY OF GOOGLE MAPS?

aiuniverse — Thu, 10 Sep 2020 09:01:13 +0000

Source: analyticsinsight.net

DeepMind is one of the companies that are leading the AI charge and coming up with innovative uses of AI. This London-based AI lab has been under the umbrella of Alphabet since the latter acquired it in January 2014. While Google’s AI ventures have been keeping it running, DeepMind is most helpful when it comes to Google Maps. For years, it has been a challenge to design a machine-learning algorithm to train AI models and softwares to help in navigation, especially in unstructured surroundings. Therefore understanding how AI can learn about cruising through an environment and guide us in the future is always an area of interest for researchers.

The reason why it is an arduous task is primarily that long-range navigation is a complex cognitive task that relies on developing an internal representation of space, grounded by familiar landmarks and robust visual processing, that can simultaneously support continuous self-localization (“I am here”) and a representation of the goal (“I am going there”). This is where DeepMind’s deep reinforcement learning helps to solve the hitch. Besides, it is essential to address this as people rely on the accuracy of Google Maps to assist them. Every day, this app provides useful directions, real-time traffic information, and information on businesses to millions of people, along with accurate traffic predictions and estimated times of arrival (ETAs). As a result, it is crucial to mirror the ever-changing landscape of urban lands.

Recently, researchers at DeepMind teamed up with Google Maps to improve the accuracy of real-time ETAs by up to 50% in places like Berlin, Jakarta, São Paulo, Sydney, Tokyo, and Washington D.C. by using advanced machine learning techniques. At present, the Google Maps traffic prediction system consists of a route analyzer for processing traffic information to construct Supersegments (multiple adjacent segments of road that share significant traffic volume). It also has a Graph Neural Network model, which is optimized with various objectives and predicts the travel time for each Supersegment.

The data collected to train the machine learning model of DeepMind was extracted from authoritative data input from local governments and real-time feedback from users. The authoritative data lets Google Maps learn about speed limits, tolls, or road restrictions due to things like construction, excavation works, or COVID-19 shutdown. Meanwhile, feedback from users lets Google know that paved roads are better for driving than unpaved ones. It also helps Google to make a neural network model opt a long stretch of highway as efficient routes than a smaller shortcut road with multiple stops.

After collecting the data, in the Graph Neural Network, the model considers the local road network as a graph, with each route segment resembling as a node and edges that exist between segments that are consecutive on the same road or connected through an intersection. When a message-passing algorithm gets executed, neural networks learned those messages and studied their effect on node states and edge. Now, in the real world, these Supersegments are road subgraphs, which were sampled at random in proportion to traffic density. When a single model was successfully trained via these subgraphs, the algorithm was then deployed at scale.

Through Graph Neural Network, researchers were able to carry spatiotemporal reasoning by incorporating relational learning biases to model the connectivity structure of real-world road networks. Google Maps product manager Johann Lau says, “We saw up to a 50 percent decrease in worldwide traffic when lockdowns started in early 2020. To account for this sudden change, we’ve recently updated our models to become more agile — automatically prioritizing historical traffic patterns from the last two to four weeks, and deprioritizing patterns from any time before that.

The post HOW DEEPMIND ALGORITHMS HELPED IMPROVE THE ACCURACY OF GOOGLE MAPS? appeared first on Artificial Intelligence.

DeepMind’s Newest AI Programs Itself to Make All the Right Decisions

aiuniverse — Mon, 27 Jul 2020 05:34:12 +0000

Source: singularityhub.com

When Deep Blue defeated world chess champion Garry Kasparov in 1997, it may have seemed artificial intelligence had finally arrived. A computer had just taken down one of the top chess players of all time. But it wasn’t to be.

Though Deep Blue was meticulously programmed top-to-bottom to play chess, the approach was too labor-intensive, too dependent on clear rules and bounded possibilities to succeed at more complex games, let alone in the real world. The next revolution would take a decade and a half, when vastly more computing power and data revived machine learning, an old idea in artificial intelligence just waiting for the world to catch up.

Today, machine learning dominates, mostly by way of a family of algorithms called deep learning, while symbolic AI, the dominant approach in Deep Blue’s day, has faded into the background.

Key to deep learning’s success is the fact the algorithms basically write themselves. Given some high-level programming and a dataset, they learn from experience. No engineer anticipates every possibility in code. The algorithms just figure it.

Now, Alphabet’s DeepMind is taking this automation further by developing deep learning algorithms that can handle programming tasks which have been, to date, the sole domain of the world’s top computer scientists (and take them years to write).

In a paper recently published on the pre-print server arXiv, a database for research papers that haven’t been peer reviewed yet, the DeepMind team described a new deep reinforcement learning algorithm that was able to discover its own value function—a critical programming rule in deep reinforcement learning—from scratch.

Surprisingly, the algorithm was also effective beyond the simple environments it trained in, going on to play Atari games—a different, more complicated task—at a level that was, at times, competitive with human-designed algorithms and achieving superhuman levels of play in 14 games.

DeepMind says the approach could accelerate the development of reinforcement learning algorithms and even lead to a shift in focus, where instead of spending years writing the algorithms themselves, researchers work to perfect the environments in which they train.

Pavlov’s Digital Dog

First, a little background.

Three main deep learning approaches are supervised, unsupervised, and reinforcement learning.

The first two consume huge amounts of data (like images or articles), look for patterns in the data, and use those patterns to inform actions (like identifying an image of a cat). To us, this is a pretty alien way to learn about the world. Not only would it be mind-numbingly dull to review millions of cat images, it’d take us years or more to do what these programs do in hours or days. And of course, we can learn what a cat looks like from just a few examples. So why bother?

While supervised and unsupervised deep learning emphasize the machine in machine learning, reinforcement learning is a bit more biological. It actually is the way we learn. Confronted with several possible actions, we predict which will be most rewarding based on experience—weighing the pleasure of eating a chocolate chip cookie against avoiding a cavity and trip to the dentist.

In deep reinforcement learning, algorithms go through a similar process as they take action. In the Atari game Breakout, for instance, a player guides a paddle to bounce a ball at a ceiling of bricks, trying to break as many as possible. When playing Breakout, should an algorithm move the paddle left or right? To decide, it runs a projection—this is the value function—of which direction will maximize the total points, or rewards, it can earn.

Move by move, game by game, an algorithm combines experience and value function to learn which actions bring greater rewards and improves its play, until eventually, it becomes an uncanny Breakout player.

Learning to Learn (Very Meta)

So, a key to deep reinforcement learning is developing a good value function. And that’s difficult. According to the DeepMind team, it takes years of manual research to write the rules guiding algorithmic actions—which is why automating the process is so alluring. Their new Learned Policy Gradient (LPG) algorithm makes solid progress in that direction.

LPG trained in a number of toy environments. Most of these were “gridworlds”—literally two-dimensional grids with objects in some squares. The AI moves square to square and earns points or punishments as it encounters objects. The grids vary in size, and the distribution of objects is either set or random. The training environments offer opportunities to learn fundamental lessons for reinforcement learning algorithms.

Only in LPG’s case, it had no value function to guide that learning.

Instead, LPG has what DeepMind calls a “meta-learner.” You might think of this as an algorithm within an algorithm that, by interacting with its environment, discovers both “what to predict,” thereby forming its version of a value function, and “how to learn from it,” applying its newly discovered value function to each decision it makes in the future.

Prior work in the area has had some success, but according to DeepMind, LPG is the first algorithm to discover reinforcement learning rules from scratch and to generalize beyond training. The latter was particularly surprising because Atari games are so different from the simple worlds LPG trained in—that is, it had never seen anything like an Atari game.

Time to Hand Over the Reins? Not Just Yet

LPG is still behind advanced human-designed algorithms, the researchers said. But it outperformed a human-designed benchmark in training and even some Atari games, which suggests it isn’t strictly worse, just that it specializes in some environments.

This is where there’s room for improvement and more research.

The more environments LPG saw, the more it could successfully generalize. Intriguingly, the researchers speculate that with enough well-designed training environments, the approach might yield a general-purpose reinforcement learning algorithm.

At the least, though, they say further automation of algorithm discovery—that is, algorithms learning to learn—will accelerate the field. In the near term, it can help researchers more quickly develop hand-designed algorithms. Further out, as self-discovered algorithms like LPG improve, engineers may shift from manually developing the algorithms themselves to building the environments where they learn.

Deep learning long ago left Deep Blue in the dust at games. Perhaps algorithms learning to learn will be a winning strategy in the real world too.

The post DeepMind’s Newest AI Programs Itself to Make All the Right Decisions appeared first on Artificial Intelligence.

DeepMind’s AI automatically generates reinforcement learning algorithms

aiuniverse — Wed, 22 Jul 2020 07:01:41 +0000

Source: bestgamingpro.com

In a study printed on the preprint server Arxiv.org, DeepMind researchers describe a reinforcement learning algorithm-generating approach that discovers what to foretell and the way to be taught it by interacting with environments. They declare the generated algorithms carry out nicely on a variety of difficult Atari video video games, reaching “non-trivial” efficiency indicative of the approach’s generalizability.

Reinforcement studying algorithms — algorithms that allow software program brokers to be taught in environments by trial and error utilizing suggestions — replace an agent’s parameters in response to one in all a number of guidelines. These guidelines are often found via years of analysis, and automating their discovery from knowledge might result in extra environment friendly algorithms, or algorithms higher tailored to particular environments.

DeepMind’s answer is a meta-learning framework that collectively discovers what a specific agent ought to predict and the way to use the predictions for coverage enchancment. (In reinforcement studying, a “coverage” defines the educational agent’s approach of behaving at a given time.) Their structure — discovered coverage gradient (LGP) — permits the replace rule (that’s, the meta-learner) to resolve what the agent’s outputs must be predicting whereas the framework discovers guidelines through a number of studying brokers, every of which interacts with a special surroundings.

In experiments, the researchers evaluated the LPG immediately on advanced Atari video games together with Tutankham, Breakout, and Yars’ Revenge. They discovered that it generalized to the video games “moderately nicely” in comparison with present algorithms, regardless of the very fact the coaching environments consisted of environments with fundamental duties a lot easier than Atari video games. Furthermore, the brokers educated with the LPG managed to realize “superhuman” efficiency on 14 video games with out counting on hand-designed reinforcement studying parts.

The coauthors famous that LPG nonetheless lags behind some superior reinforcement studying algorithms. However in the course of the experiments, its generalization efficiency improved rapidly because the variety of coaching environments grew, suggesting it is perhaps possible to find a general-purpose reinforcement studying algorithm as soon as a bigger set of environments can be found for meta-training.

“The proposed strategy has the potential to dramatically speed up the method of discovering new reinforcement studying algorithms by automating the method of discovery in a data-driven approach. If the proposed analysis route succeeds, this might shift the analysis paradigm from manually growing reinforcement studying algorithms to constructing a correct set of environments in order that the ensuing algorithm is environment friendly,” the researchers wrote. “Moreover, the proposed strategy might also function a instrument to help reinforcement studying researchers in growing and enhancing their hand-designed algorithms. On this case, the proposed strategy can be utilized to supply insights about what a very good replace rule seems to be like relying on the structure that researchers present as enter, which might velocity up the guide discovery of reinforcement studying algorithms.”

The post DeepMind’s AI automatically generates reinforcement learning algorithms appeared first on Artificial Intelligence.

DeepMind’s AI automatically generates reinforcement learning algorithms

aiuniverse — Tue, 21 Jul 2020 07:12:04 +0000

Source: venturebeat.com

In a study published on the preprint server Arxiv.org, DeepMind researchers describe a reinforcement learning algorithm-generating technique that discovers what to predict and how to learn it by interacting with environments. They claim the generated algorithms perform well on a range of challenging Atari video games, achieving “non-trivial” performance indicative of the technique’s generalizability.

Reinforcement learning algorithms — algorithms that enable software agents to learn in environments by trial and error using feedback — update an agent’s parameters according to one of several rules. These rules are usually discovered through years of research, and automating their discovery from data could lead to more efficient algorithms, or algorithms better adapted to specific environments.

DeepMind’s solution is a meta-learning framework that jointly discovers what a particular agent should predict and how to use the predictions for policy improvement. (In reinforcement learning, a “policy” defines the learning agent’s way of behaving at a given time.) Their architecture — learned policy gradient (LGP) — allows the update rule (that is, the meta-learner) to decide what the agent’s outputs should be predicting while the framework discovers rules via multiple learning agents, each of which interacts with a different environment.

In experiments, the researchers evaluated the LPG directly on complex Atari games including Tutankham, Breakout, and Yars’ Revenge. They found that it generalized to the games “reasonably well” when compared with existing algorithms, despite the fact the training environments consisted of environments with basic tasks much simpler than Atari games. Moreover, the agents trained with the LPG managed to achieve “superhuman” performance on 14 games without relying on hand-designed reinforcement learning components.

The coauthors noted that LPG still lags behind some advanced reinforcement learning algorithms. But during the experiments, its generalization performance improved quickly as the number of training environments grew, suggesting it might be feasible to discover a general-purpose reinforcement learning algorithm once a larger set of environments are available for meta-training.

“The proposed approach has the potential to dramatically accelerate the process of discovering new reinforcement learning algorithms by automating the process of discovery in a data-driven way. If the proposed research direction succeeds, this could shift the research paradigm from manually developing reinforcement learning algorithms to building a proper set of environments so that the resulting algorithm is efficient,” the researchers wrote. “Additionally, the proposed approach may also serve as a tool to assist reinforcement learning researchers in developing and improving their hand-designed algorithms. In this case, the proposed approach can be used to provide insights about what a good update rule looks like depending on the architecture that researchers provide as input, which could speed up the manual discovery of reinforcement learning algorithms.”

The post DeepMind’s AI automatically generates reinforcement learning algorithms appeared first on Artificial Intelligence.

Reinforcement Learning: The Next Big Thing For AI (Artificial Intelligence)?

aiuniverse — Thu, 02 Jul 2020 06:43:00 +0000

Source: forbes.com

When it comes to AI, much of the attention has been on deep learning. And for good reason. This part of the AI world has seen great strides, such as with image recognition.

But of course, there are other areas of AI that look promising, such as reinforcement learning. Keep in mind that cutting-edge companies like Google’s DeepMind and OpenAI have already made breakthroughs with this approach.

So what is reinforcement learning? Well, interesting enough, it is not new. “Reinforcement learning is a classic behavioral phenomenon, known in the psychology literature since the early 1950s,” said Dr. Matt Johnson, who is a professor of psychology at Hult International Business School and the author of Blindsight: The (Mostly) Hidden Ways Marketing Reshapes Our Brains. “In its simplest form, it states that the frequency of a behavior will go up or down depending on the direct consequences of that behavior. This is true of animal behavior as well as human behavior.”

But some of the key principles of reinforcement learning have been applied to AI models. This is often referred to as deep reinforcement learning (since it is leveraged with deep learning).

“Reinforcement learning entails an agent, action and reward,” said Ankur Taly, who is the head of data science at Fiddler. “The agent, such as a robot or character, interacts with its surrounding environment and observes a specific activity, responding accordingly to produce a beneficial or desired result. Reinforcement learning adheres to a specific methodology and determines the best means to obtain the best result. It’s very similar to the structure of how we play a video game, in which the agent engages in a series of trials to obtain the highest score or reward. Over several iterations, it learns to maximize its cumulative reward.”

In fact, some of the most interesting use cases for reinforcement learning have been with complex games. Consider the case of DeepMind’s AlphaGo. The system used reinforcement learning to quickly understand how to play Go and was able to beat the world champion, Lee Sedol, in 2016 (the game has more potential moves than the number of atoms in the universe!)

But there have certainly been other applications of the technology that go beyond gaming. To this end, reinforcement learning has been particularly useful with robotics. For example, OpenAI has used this technique for a robotic arm that was able to solve the Rubik’s cube.

Reinforcement learning has even been shown to be effective when finding better solutions for tax policies and equality, as seen with Saleforce.com’s AI Economist. “We believe a reinforcement learning framework is well-suited for uncovering insights on how the behavior of economic agents could be influenced by pulling different policy ‘levers,’” said Richard Socher, who is the Chief Scientist at Salesforce. “This is one of many scenarios where we believe reinforcement learning can be utilized in the future.”

Here are some other areas where reinforcement learning can make an impact:

Entertainment: “The future consists of free-form environments that the next generation of ‘movie-goers’ and gamers are looking for,” said Yuheng Chen, who is the COO of rct studio. “AI-powered characters will co-adapt to produce elaborate storylines, and consumers will no longer be confined to fixed dialogues and rigid interaction between non-player characters.”
Healthcare: “Imagine trying to use reinforcement learning to teach an AI doctor how to treat a medical patient,” said Noah Giansiracusa, who is an Assistant Professor of Mathematical Sciences at Bentley University. “The AI doctor might try medications almost randomly to see what effect they have and over time should learn the patterns and develop an understanding of which medications work best in which situations. But we obviously can’t let the AI doctor perform these experiments on real patients and physiology is far too complicated to build a suitable computer simulation of the human body to experiment on virtually. However, with vast troves of medical data, when the AI doctor wants to try a certain medication on a certain patient, we can look through the data and find an actual historic patient who had similar symptoms and vitals as the current patient, and even find such a patient who was then given the medication in question—-thus the AI doctor is not actually performing new experiments to learn, it is suggesting experiments to try then looking back at past data to see what typically happened when that action was taken.”

Now reinforcement learning is still in the nascent phases. But given the advances so far, this approach to AI is likely to get more important. “I believe reinforcement learning is on the cusp of rippling through and disrupting a lot of industries,” said Giansiracusa.

The post Reinforcement Learning: The Next Big Thing For AI (Artificial Intelligence)? appeared first on Artificial Intelligence.

DeepMind hopes to teach AI to cooperate by playing Diplomacy

aiuniverse — Thu, 11 Jun 2020 07:07:03 +0000

Source: venturebeat.com

DeepMind, the Alphabet-backed machine learning lab that’s tackled chess, Go, Starcraft 2, Montezuma’s Revenge, and beyond, believes the board game Diplomacy could motivate a promising new direction in reinforcement learning research. In a paper published on the preprint server Arxiv.org, the firm’s researchers describe an AI system that achieves high scores in Diplomacy while yielding “consistent improvements.”

AI systems have achieved strong competitive play in complex, large-scale games like Hex, shogi, and poker, but the bulk of these are two-player zero-sum games where a player can win only by causing another player to lose. That doesn’t reflect the real world, necessarily; tasks like route planning around congestion, contract negotiations, and interacting with customers all involve compromise and consideration of how preferences of group members coincide and conflict. Even when AI software agents are self-interested, they might gain by coordinating and cooperating, so interacting among diverse groups requires complex reasoning about others’ goals and motivations.

The game Diplomacy forces these interactions by tasking seven players with controlling multiple units on a province-level map of Europe. Each turn, all players move all their units simultaneously within one of 34 provinces, and one unit may support another unit owned by the same or another player to allow it to overcome resistance by other units. (Alternatively, units — which have equal strength — can hold a province or move to an adjacent space.) Provinces are supply centers, and units capture supply centers by occupying the province. Owning more supply centers allows a player to build more units, and the game is won by owning a majority of the supply centers.

Due to the interdependencies between units, players must negotiate the moves of their own units. They stand to gain by coordinating their moves with those of other players, and they must anticipate how other players will act and reflect these expectations in their actions.

“We propose using games like Diplomacy to study the emergence and detection of manipulative behaviors … to make sure that we know how to mitigate such behaviors in real-world applications,” the coauthors wrote. “Research on Diplomacy could pave the way towards creating artificial agents that can successfully cooperate with others, including handling difficult questions that arise around establishing and maintaining trust and alliances.”

DeepMind focused on the “no press” variant of Diplomacy, where no explicit communication is allowed. It trained reinforcement learning agents — agents that take actions to maximize some reward — using an approach called Sampled Best Responses (SBR), which handled the large number of actions (10⁶⁴) players can take in Diplomacy, with a policy iteration technique that approximates the best responses to players’ actions as well as fictitious play.

At each iteration, DeepMind’s system creates a data set of games, with actions chosen by a module called an improvement operator that uses a previous strategy (policy) and value function to find a policy that defeats the previous policy. It then trains the policy and value functions to predict the actions the improvement operator will choose as well as the game results.

The aforementioned SBR identifies policies that maximize the expected return for the system’s agents against opponents’ policies. SBR is coupled with Best Response Policy Iteration (BRPI), a family of algorithms tailored to using SBRs in many-player games, the most sophisticated of which trains the policies to predict only the latest BR and explicitly averages historical checkpoints to provide the current empirical strategy.

To evaluate the system’s performance, DeepMind measured the head-to-head win rates against six agents from different algorithms and against a population of six players independently drawn from a reference corpus. They also considered “meta-games” between checkpoints of one training run to test for consistent improvement and examined the exploitability (the margin by which an adversary would defeat a population of agents) of the game-playing agents.

The system’s win rates weren’t especially high — averaged over five seeds of each game, they ranged between 12.7% and 32.5% — but DeepMind notes that they represent a large improvement over agents trained with supervised learning. Against one algorithm in particular — DipNet — in a 6-to-1 game, where six of the agents were controlled by DeepMind’s system, the win rates of DeepMind’s agents improved steadily through training.

In future work, the researchers plan to investigate ways to reduce the agents’ exploitability and build agents that reason about the incentives of others, potentially through communication. “Using [reinforcement learning] to improve game-play in … Diplomacy is a prerequisite for investigating the complex mixed motives and many-player aspects of this game … Beyond the direct impact on Diplomacy, possible applications of our method include business, economic, and logistics domains … In providing the capability of training a tactical baseline agent for Diplomacy or similar games, this work also paves the way for research into agents that are capable of forming alliances and use more advanced communication abilities, either with other machines or with humans.”

The post DeepMind hopes to teach AI to cooperate by playing Diplomacy appeared first on Artificial Intelligence.

Why do people focus on deep learning when it comes to artificial intelligence?

aiuniverse — Sat, 06 Jun 2020 07:02:06 +0000

Source: optocrypto.com

There are other areas of AI that look promising, such as Deep Learning. Remember that top companies like Google’s DeepMind and OpenAI are already working on this approach. Making the breakthrough.

So what is Deep Learning? Well, interestingly, it’s not new. “Deep learning is a classic behavioral phenomenon widely known in the psychological literature since the early 1950s,” says Matt D., says Matt Johnson, Ph. He is a professor of psychology at Haught International Business School and author of Blindsight: the ( Mostly ) Hidden Ways Marketing) author. Transforming our brain. “In its simplest form, the frequency of behavior will rise or fall depending on the immediate consequences of that behavior. This applies to both animal and human behavior.

Future is AI & Deep Learning

However, some of the key principles of Deep learning have been applied to AI models. Fiddler’s head of data science Ankur Taly says: “Deep learning requires action, a stimulus and a payoff”. “An agent, such as a robot or a character, interacts with its environment, observes certain activities and reacts accordingly in order to achieve useful or desirable results”. Deep learning follows a specific approach and determines the best way to achieve the best results. This is very similar to the structure in which we play video games, where the agent makes a series of attempts to obtain the highest score or maximum reward. After many iterations, it learns to maximize its cumulative rewards”.

Machine Learning is changing the world

In fact, some of the most interesting applications for Deep learning can be complex games. Consider the case of DeepMind’s AlphaGo. The system quickly learned how to play Go through Deep learning and beat world champion Lee Sedol 2016 (the game has more action potential than the number of atoms in the universe!)

But of course, there are other applications for the technology than just games. This is why Reinforcement Learning is particularly useful for robotics. OpenAI has, for example, using the technology for a robot arm that is able to solve the magic cube.

Here are some other areas where Deep learning can have an impact:

Entertainment: “The future will consist of the free-form environment that the next generation of ‘movie lovers’. AI-driven characters will work together to adapt to generate detailed storylines, and consumers will no longer rely on fixed conversations rather than rigid interactions between player characters.

Healthcare: “ImagineAI trying to teach doctors on how to treat medical patients through deep learning. Also, AI doctors can try drugs almost at random to see how they work, and over time they should develop patterns and understand which drugs work best in which situations. But we obviously can’t get AI doctors to do these experiments on real patients, and the physiology is too complex to construct Suitable human-computer simulations to conduct virtual experiments.

Copyright notice: This article is reprinted for the purpose of transmitting more information. If the source is incorrectly marked or infringes your legal rights, please contact us here, we will correct and delete it in time, thank you for your support and understanding.

The post Why do people focus on deep learning when it comes to artificial intelligence? appeared first on Artificial Intelligence.

DeepMind releases Acme, a distributed framework for reinforcement learning algorithm development

aiuniverse — Thu, 04 Jun 2020 09:43:16 +0000

Source: venturebeat.com

DeepMind this week released Acme, a framework intended to simplify the development of reinforcement learning algorithms by enabling AI-driven agents to run at various scales of execution. According to the engineers and researchers behind Acme, who coauthored a technical paper on the work, it can be used to create agents with greater parallelization than in previous approaches.

Reinforcement learning involves agents that interact with an environment to generate their own training data, and it’s led to breakthroughs in fields from video games and robotics to self-driving robo-taxis. Recent advances are partly attributable to increases in the amount of training data used, which has motivated the design of systems where agents interact with instances of an environment to quickly accumulate experience. This scaling from single-process prototypes of algorithms to distributed systems often requires a reimplementation of the agents in question, DeepMind asserts, which is where the Acme framework comes in.

Acme is a development suite for training reinforcement learning agents that attempts to address the issues of both complexity and scale, with components for constructing agents at various levels of abstraction from algorithms and policies to learners. The thinking goes that this will allow for the swift iteration of ideas and the evaluation of those ideas in production, chiefly through training loops, obsessive logging, and checkpointing.

Within Acme, actors interact closely with an environment, making observations produced by the environment and taking actions that in turn feed into the environment. After observing the ensuing transition, the actors are given an opportunity to update their states; this most often relates to their action-selection policies, which determine which actions they take in response to the environment. A special type of Acme actor comprises both acting and learning components — they’re referred to as “agents” — and their state updates are triggered by some number of steps within the learner component. That said, agents for the most part defer their action selection to their own acting component.

Acme provides a data set module that sits between the actor and learner components, backed by a low-level storage system called Reverb that DeepMind also released this week. In addition, the framework establishes a common interface for insertion into Reverb, enabling different styles of preprocessing and the ongoing aggregation of observational data.

Acting, learning, and storage components are split among different threads or processes within Acme, which confers two benefits: environment interactions occur asynchronously with the learning process, and data generation accelerates. Elsewhere, Acme’s rate limitation allows the enforcement of a desired rate from learning to acting, allowing processes to run unblocked so long as they remain within some defined tolerance. For instance, if one of the processes starts lagging behind the other due to network issues or insufficient resources, the rate limiter will block the laggard while the other catches up.

In addition to these tools and resources, Acme ships with a set of example agents meant to serve as reference implementations of their respective reinforcement learning algorithms as well as strong research baselines. More might become available in the future, DeepMind says. “By providing these … we hope that Acme will help improve the status of reproducibility in [reinforcement learning], and empower the academic research community with simple building blocks to create new agents,” wrote the researchers. “Additionally, our baselines should provide additional yardsticks to measure progress in the field.”

The post DeepMind releases Acme, a distributed framework for reinforcement learning algorithm development appeared first on Artificial Intelligence.

TOP 10 REINFORCEMENT LEARNING PAPERS FROM ICLR 2020

aiuniverse — Wed, 03 Jun 2020 07:15:52 +0000

Source: analyticsindiamag.com

Reinforcement Learning has become the base approach in order to attain artificial general intelligence. The ICLR (International Conference on Learning Representations) is one of the major AI conferences that take place every year. With more than 600 interesting research papers, there are around 44 research papers in reinforcement learning that have been accepted in this year’s conference.

This article lists down the top 10 papers on reinforcement learning one must read from ICLR 2020.

1| Graph Convolutional Reinforcement Learning

About: In this paper, the researchers proposed graph convolutional reinforcement learning. In this model, the graph convolution adapts to the dynamics of the underlying graph of the multi-agent environment whereas the relation kernels capture the interplay between agents by their relation representations. In simple words, the multi-agent environment is modelled as a graph and the graph convolutional reinforcement learning, also called DGN is instantiated based on deep Q network and trained end-to-end.

According to the researchers, unlike other parameter-sharing methods, graph convolution enhances the cooperation of agents by allowing the policy to be optimised by jointly considering agents in the receptive field and promoting mutual help.

2| Measuring the Reliability of Reinforcement Learning Algorithms

About: Lack of reliability is a well-known issue for reinforcement learning (RL) algorithms. In this paper, the researchers proposed a set of metrics that quantitatively measure different aspects of reliability.

According to the researchers, the analysis distinguishes between several typical modes to evaluate RL performance, such as “evaluation during training” that is computed over the course of training vs “evaluation after learning”, which is evaluated on a fixed policy after it has been trained. These metrics are also designed to measure different aspects of reliability, e.g. reproducibility (variability across training runs and variability across rollouts of a fixed policy) or stability (variability within training runs).

3| Behaviour Suite for Reinforcement Learning

About: The researchers at DeepMind introduces the Behaviour Suite for Reinforcement Learning or bsuite for short. bsuite is a collection of carefully-designed experiments that investigate the core capabilities of reinforcement learning agents with two objectives.

First, to collect clear, informative and scalable problems that capture key issues in the design of general and efficient learning algorithms. Second, to study agent behaviour through their performance on these shared benchmarks.

4| The Ingredients of Real World Robotic Reinforcement Learning

About: In this paper, the researcher at UC, Berkeley and team discussed the elements for a robotic learning system that can autonomously improve with the data that are collected in the real world. They proposed a particular instantiation of a system using dexterous manipulation and investigated several challenges that come up when learning without instrumentation.

Furthermore, the researchers proposed simple and scalable solutions to these challenges, and then demonstrated the efficacy of the proposed system on a set of dexterous robotic manipulation tasks. They also provided an in-depth analysis of the challenges associated with this learning paradigm.

5| Network Randomisation: A Simple Technique for Generalisation in Deep Reinforcement Learning

About: Here, the researchers proposed a simple technique to improve a generalisation ability of deep RL agents by introducing a randomised (convolutional) neural network that randomly perturbs input observations. The technique enables trained agents to adapt to new domains by learning robust features invariant across varied and randomised environments.

6| On the Weaknesses of Reinforcement Learning for Neural Machine Translation

About: Reinforcement learning (RL) is frequently used to increase performance in text generation tasks, including machine translation (MT) through the use of Minimum Risk Training (MRT) and Generative Adversarial Networks (GAN). In this paper, the researchers proved that one of the most common RL methods for MT does not optimise the expected reward, as well as show that other methods take an infeasible long time to converge. They further suggested that Reinforcement learning practices in machine translation are likely to improve the performance in some cases such as, where the pre-trained parameters are already close to yielding the correct translation.

7| Reinforcement Learning Based Graph-to-Sequence Model for Natural Question Generation

About: In this paper, the researchers proposed a reinforcement learning based graph-to-sequence (Graph2Seq) model for Natural Question Generation (QG). The model consists of a Graph2Seq generator with a novel Bidirectional Gated Graph Neural Network-based encoder to embed the passage and a hybrid evaluator with a mixed objective combining both cross-entropy and RL losses to ensure the generation of syntactically and semantically valid text. The proposed model is end-to-end trainable, achieves new state-of-the-art scores, and outperforms existing methods by a significant margin on the standard SQuAD benchmark for QG.

8| Adversarial Policies: Attacking Deep Reinforcement Learning

About: Deep reinforcement learning policies are known to be vulnerable to adversarial perturbations to their observations, similar to adversarial examples for classifiers. In this paper, the researchers proposed a novel and physically realistic threat model for adversarial examples in RL and demonstrated the existence of adversarial policies in this threat model for several simulated robotics games.

The researchers further conducted a detailed analysis of why the adversarial policies work and how the adversarial policies reliably beat the victim, despite training with less than 3% as many timesteps and generating seemingly random behaviour.

9| Causal Discovery with Reinforcement Learning

About: Discovering causal structure among a set of variables is a fundamental problem in many empirical sciences. In this paper, the researchers proposed to use reinforcement learning to search for the Directed Acyclic Graph (DAG) with the best scoring. The encoder-decoder model takes observable data as input and generates graph adjacency matrices that are used to compute rewards. In contrast with typical RL applications where the goal is to learn a policy, they used RL as a search strategy and the final output would be the graph, among all graphs generated during training, that achieves the best reward.

10| Model-Based Reinforcement Learning for Atari

About: In this paper, the researchers explored how video prediction models can similarly enable agents to solve Atari games with fewer interactions than model-free methods. They described Simulated Policy Learning (SimPLe), which is a complete model-based deep RL algorithm based on video prediction models and presents a comparison of several model architectures, including a novel architecture that yields the best results in the setting. According to the researchers, in most games, SimPLe outperformed state-of-the-art model-free algorithms, while in some games by over an order of magnitude.

The post TOP 10 REINFORCEMENT LEARNING PAPERS FROM ICLR 2020 appeared first on Artificial Intelligence.