Reinforcement Learning Archives - Artificial Intelligence

DeepMind open-sources Lab2D to support creation of 2D environments for AI and machine learning

aiuniverse — Wed, 18 Nov 2020 05:28:19 +0000

Source: computing.co.uk

Alphabet subsidiary DeepMind announced on Monday that it has open-sourced Lab2D, a scalable environment simulator for artificial intelligence (AI) research that facilitates researcher-led experimentation with environment design.

DeepMind describes Lab2D as a system designed to support creation of two-dimensional (2D) layered, discrete “grid-world” environments, in which pieces move around in the same way as chess pieces move around on a chess board.

The system is particularly tailored for multi-agent reinforcement learning, according to Lab2D researchers.

The computationally intensive engine for Lab2D is written in C++ for efficiency, while most of the level-specific logic is written in Lua.

“The environments are ‘grid worlds’, which are defined with a combination of simple text-based maps for the layout of the world, and Lua code for its behaviour,” the researchers state in their study paper.

“Machine learning agents interact with these environments through one of two APIs, the Python dm_env API or a custom C API (which is also used by DeepMind Lab).”

The researchers note that in the rush to create artificial general intelligence which will work in any environment, ‘tinkering’ with environmental variables has become unfashionable. Nevertheless, in real-world use cases simulated environments are essential to discover how systems based on reinforcement learning develop an understanding of the conditions in which they operate.

They make the case that 2D environments are inherently easier to understand than three-dimensional ones, at very little, if any, loss of expressiveness, and are more performant and easier to use.

“Rich complexity along numerous dimensions can be studied in 2D just as readily as in 3D, if not more so.”

They note that 2D worlds have been successfully used to study problems as diverse as navigation, social complexity, imperfect information, and abstract reasoning.

“2D worlds can often capture the relevant complexity of the problem at hand without the need for continuous-time physical environments.”

Another advantage of 2D worlds is that they are easier to program and design than their 3D counterparts. This has been particularly noticed when the 3D world actually exploits the space or physical dynamics beyond the capabilities of 2D ones.

Moreover, 2D worlds do not need complex 3D assets to be evocative, or any reasoning about lighting, shaders, and projections.

The decision to open-source Lab2D comes after DeepMind released OpenSpiel, a reinforcement learning framework for video games, designed to “promote general multi-agent reinforcement learning across many different game types, in a similar way as general game-playing but with a heavy emphasis on learning and not in competition form.”

Lab2D seeks to build on this work by providing a means to study how agents learn.

“We think that progress toward artificial general intelligence requires robust simulation platforms to enable in silico exploration of agent learning, skill acquisition, and careful measurement. We hope that the system we introduce here, DeepMind Lab2D, can fill this role.”

The post DeepMind open-sources Lab2D to support creation of 2D environments for AI and machine learning appeared first on Artificial Intelligence.

A VR Film/Game with AI Characters Can Be Different Every Time You Watch or Play

aiuniverse — Mon, 19 Oct 2020 06:44:02 +0000

Source: technologyreview.com

The square-faced, three-legged alien shoves and jostles to get at the enormous plant taking over its tiny planet. But each bite just makes the forbidden fruit grow bigger. Suddenly the plant’s weight flips the whole sphere upside down and all the little creatures drop into space.

Quick! Reach in and catch one!

Agence, a short interactive VR film from Toronto-based studio Transitional Forms and the National Film Board of Canada, won’t be breaking any box office records. Falling somewhere in the no-man’s-land between movies and video games, it may struggle to find an audience at all. But as the first example of a film that uses reinforcement learning to control its animated characters, it could be a glimpse into the future of filmmaking.

“I am super passionate about artificial intelligence because I believe that AI and movies belong together,” says the film’s director, Pietro Gagliano.

Gagliano previously won the first-ever Emmy for a VR experience in 2015. Now he and producer David Oppenheim at the National Film Board of Canada are experimenting with a kind of storytelling they call dynamic film. “We see Agence as a sort of silent-era dynamic film,” says Oppenheim. “It’s a beginning, not a blockbuster.”

Agence was debuted at the Venice International Film Festival last month and was released this week to watch/play via Steam, an online video-game platform. The basic plot revolves around a group of creatures and their appetite for a mysterious plant that appears on their planet. Can they control their desire, or will they destabilize the planet and get tipped to their doom? Survivors ascend to another world. After several ascensions, there is a secret ending, says Oppenheim.

Gagliano and Oppenheim want viewers to have the option of sitting back and watching a story unfold, with the AI characters left to their own devices, or getting involved and changing the action on the fly. There’s a broad spectrum of interactivity, says Gagliano: “A lot of interactive films have decision moments, when you can branch the narrative, but I wanted to create something that let you transform the story at any point.”

A certain degree of interactivity comes from choosing the type of AI that controls each character. You can make some use rule-based AI, which guides the character using simple heuristics—if this happens, then do that. Then you can make others become reinforcement-learning agents trained to seek rewards however they like, such as fighting for a bite of the fruit. Characters that follow rules stick closer to Gagliano’s direction; RL agents inject some chaos.

But you can also lean in. Using VR controls or a game pad, you can grab characters and move them around, plant more giant flowers, and help balance the planet. The characters carry on with their business around you, seeking their rewards as best they can.

The film got some interest in Venice, says Oppenheim: “A lot of people come looking for that mix of story and interactivity. Introducing AI into the mix was something that people responded really well to.”

Gagliano’s mother also likes it. When he showed it to her, she spent the whole time breaking up fights between the creatures. “She was like, ‘You behave! You go back here and you play nicely,’” he says. “That was a storyline I wasn’t expecting.”

But people expecting a game have had a cooler response. “Gamers treat it more as a puzzle,” says Oppenheim. And the short running time and lack of challenge have put off some online reviewers.

Still, the pair see Agence as a work in progress. They want to collaborate with other AI developers to give their characters different desires, which would lead to different stories. In the long run, they think, they could use AI to generate all parts of a film, from character behavior to dialogue to entire environments. It could create surprising, dreamlike experiences for all of us, says Oppenheim.

The post A VR Film/Game with AI Characters Can Be Different Every Time You Watch or Play appeared first on Artificial Intelligence.

Researchers detail LaND, AI that learns from autonomous vehicle disengagements

aiuniverse — Sat, 17 Oct 2020 06:20:15 +0000

Source: venturebeat.com

UC Berkeley AI researchers say they’ve created AI for autonomous vehicles driving in unseen, real-world landscapes that outperforms leading methods for delivery robots driving on sidewalks. Called LaND, for Learning to Navigate from Disengagements, the navigation system studies disengagement events, then predicts when disengagements will happen in the future. The approach is meant to provide what the researchers call a needed shift in perspective about disengagements for the AI community.

A disengagement describes each instance when an autonomous system encounters challenging conditions and must turn control back over to a human operator. Disengagement events are a contested, and some say outdated, metric for measuring the capabilities of an autonomous vehicle system. AI researchers often treat disengagements as a signal for troubleshooting or debugging navigation systems for delivery robots on sidewalks or autonomous vehicles on roads, but LaND treats disengagements as part of training data.

Doing so, according to engineers from Berkeley AI Research, allows the robot to learn from datasets collected naturally during the testing process. Other systems have learned directly from training data gathered from onboard sensors, but researchers say that can require a lot of labeled data and be expensive.

“Our results demonstrate LaND can successfully learn to navigate in diverse, real world sidewalk environments, outperforming both imitation learning and reinforcement learning approaches,” the paper reads. “Our key insight is that if the robot can successfully learn to execute actions that avoid disengagement, then the robot will successfully perform the desired task. Crucially, unlike conventional reinforcement learning algorithms, which use task-specific reward functions, our approach does not even need to know the task — the task is specified implicitly through the disengagement signal. However, similar to standard reinforcement learning algorithms, our approach continuously improves because our learning algorithm reinforces actions that avoid disengagements.”

LaND utilizes reinforcement learning, but rather than seek a reward, each disengagement event is treated as a way to learn directly from input sensors like a camera while taking into account factors like steering angle and whether autonomy mode was engaged. The researchers detailed LaND in a paper and code published last week on preprint repository arXiv.

The team collected training data to build LaND by driving a Clearpath Jackal robot on the sidewalks of Berkeley. A human safety driver escorted the robot to reset its course or take over driving for a short period if the robot drove into a street, driveway, or other obstacle. In all, nearly 35,000 data points were collected and nearly 2,000 disengagements were produced during the LaND training on Berkeley sidewalks. Delivery robot startup Kiwibot also operates at UC Berkeley and on nearby sidewalks.

Compared with a deep reinforcement learning algorithm (Kendall et al.) and behavioral cloning, a common method of imitation learning, initial experiments showed that LaND traveled longer distances on sidewalks before disengaging.

In future work, authors say LaND can be combined with existing navigation systems, particularly leading imitation learning methods that use data from experts for improved results. Investigating ways to have the robot alert its handlers when it needs human monitoring could lower costs.

In other recent work focused on keeping training costs down for robotic systems, in August a group of UC Berkeley AI researchers created a simple method for training grasping systems that uses a $18 reacher-grabber and GoPro to collect training data for robotic grasping systems. Last year, Berkeley researchers including Pieter Abbeel, a coauthor of LaND research, introduced Blue, a general purpose robot that costs a fraction of existing robot systems.

The post Researchers detail LaND, AI that learns from autonomous vehicle disengagements appeared first on Artificial Intelligence.

Google Teases Large Scale Reinforcement Learning Infrastructurean

aiuniverse — Wed, 07 Oct 2020 06:40:44 +0000

Source: alyticsindiamag.com

The current state-of-the-art reinforcement learning techniques require many iterations over many samples from the environment to learn a target task. For instance, the game Dota 2 learns from batches of 2 million frames every 2 seconds. The infrastructure that handles RL at this scale should be not only good at collecting a large number of samples, but also be able to quickly iterate over these extensive amounts of samples during training. To be efficient requires to overcome a few common challenges:

Should service a large number of read requests from actors to a learner for model retrieval as the number of actors increases.
The processor performance is often restricted by the efficiency of the input pipeline in feeding the training data to the compute cores.
As the number of computing cores increases, the performance of the input pipeline becomes even more critical for the overall training runtime.
So, Google has now introduced Menger, a massive large-scale distributed reinforcement learning infrastructure with localised inference. This can also scale up to several thousand actors across multiple processing clusters reducing the overall training time in the task of chip placement. Chip placement or chip floor design is time-consuming and manual. Earlier this year, Google demonstrated how the problem of chip placement could be solved through the lens of deep reinforcement learning and bring down the time of designing a chip.

With Menger, Google tested the scalability and efficiency through TPU accelerators on-chip placement tasks.

How It Works

The above illustration is an overview of a distributed RL system with multiple actors placed in different Borg cells. Google’s Borg system, introduced in 2015, is a cluster manager that runs thousands of jobs, from many thousands of different applications, across tens of thousands of machines. With increasing updates from multiple actors within an environment, the communication between learner and actors is throttled, and this leads to an increase in convergence time.

The main responsibility here, wrote the researchers, is maintaining a balance between a large number of requests from actors and the learner job. They also state that adding caching components not only reduces the pressure on the learner to service the read requests but also further distributes the actors across multiple Borg cells. This, in turn, reduces computation overhead.

Menger uses Reverb, an open-sourced data storage system designed to implement experience replay in a variety of on-policy/off-policy algorithms for machine learning applications that provides an efficient and flexible platform. Reverb’s sharding helped balance the load from a large number of actors across multiple servers, instead of throttling a single replay buffer server while minimising the latency for each replay buffer server. However, the researchers also state that using a single Reverb replay buffer service does not cut the job. It doesn’t scale well in a distributed RL setting with multiple actors. It becomes inefficient with multiple actors.

The researchers claim that they have successfully used Menger infrastructure to drastically reduce the training time.

Key Takeaways

Reinforcement learning applications have slowly found themselves in unexpected domains. But, implementing RL techniques is tricky. The performance accuracy trade-off looms large in research. With Menger, the researchers have tried to answer the shortcomings of RL infrastructure. However, its promising results in the intricate task of chip placement has the potential to shorten the chip design cycle and other challenging real-world tasks as well.

Reduces the average read latency by a factor of ~4.0x, leading to faster training iterations, especially for on-policy algorithms.
Efficient scaling of Menger is due to the sharding capability of Reverb.
The training time was reduced from ~8.6 hours down to merely one hour compared to the state-of-the-art.

The post Google Teases Large Scale Reinforcement Learning Infrastructurean appeared first on Artificial Intelligence.

Plan2Explore: Active Model-Building for Self-Supervised Visual Reinforcement Learning

aiuniverse — Tue, 06 Oct 2020 08:23:22 +0000

Source: bair.berkeley.edu

To operate successfully in unstructured open-world environments, autonomous intelligent agents need to solve many different tasks and learn new tasks quickly. Reinforcement learning has enabled artificial agents to solve complex tasks both in simulation and real-world. However, it requires collecting large amounts of experience in the environment for each individual task. Self-supervised reinforcement learning has emerged as an alternative, where the agent only follows an intrinsic objective that is independent of any individual task, analogously to unsupervised representation learning. After acquiring general and reusable knowledge about the environment through self-supervision, the agent can adapt to specific downstream tasks more efficiently.

In this post, we explain our recent publication that develops Plan2Explore. While many recent papers on self-supervised reinforcement learning have focused on model-free agents, our agent learns an internal world model that predicts the future outcomes of potential actions. The world model captures general knowledge, allowing Plan2Explore to quickly solve new tasks through planning in its own imagination. The world model further enables the agent to explore what it expects to be novel, rather than repeating what it found novel in the past. Plan2Explore obtains state-of-the-art zero-shot and few-shot performance on continuous control benchmarks with high-dimensional input images. To make it easy to experiment with our agent, we are open-sourcing the complete source code .

How does Plan2Explore work?

At a high level, Plan2Explore works by training a world model, exploring to maximize the information gain for the world model, and using the world model at test time to solve new tasks (see figure above). Thanks to effective exploration, the learned world model is general and captures information that can be used to solve multiple new tasks with no or few additional environment interactions. We discuss each part of the Plan2Explore algorithm individually below. We assume a basic understanding of reinforcement learning in this post and otherwise recommend these materials as an introduction.

Learning the world model

Plan2Explore learns a world model that predicts future outcomes given past observations o1:t and actions a1:t (see figure below). To handle high-dimensional image observations, we encode them into lower-dimensional features h and use an RSSM model that predicts forward in a compact latent state-space s, from which the observations can be decoded. The latent state aggregates information from past observations that is helpful for future prediction, and is learned end-to-end using a variational objective.

A novelty metric for active model-building

To learn an accurate and general world model we need an exploration strategy that collects new and informative data. To achieve this, Plan2Explore uses a novelty metric derived from the model itself. The novelty metric measures the expected information gained about the environment upon observing the new data. As the figure below shows, this is approximated by the disagreement of an ensemble of K latent models. Intuitively, large latent disagreement reflects high model uncertainty, and obtaining the data point would reduce this uncertainty. By maximizing latent disagreement, Plan2Explore selects actions that lead to the largest information gain, therefore improving the model as quickly as possible.

Planning for future novelty

To effectively maximize novelty, we need to know which parts of the environment are still unexplored. Most prior work on self-supervised exploration used model-free methods that reinforce past behavior that resulted in novel experience. This makes these methods slow to explore: since they can only repeat exploration behavior that was successful in the past, they are unlikely to stumble onto something novel. In contrast, Plan2Explore plans for expected novelty by measuring model uncertainty of imagined future outcomes. By seeking trajectories that have the highest uncertainty, Plan2Explore explores exactly the parts of the environments that were previously unknown.

To choose actions a that optimize the exploration objective, Plan2Explore leverages the learned world model as shown in the figure below. The actions are selected to maximize the expected novelty of the entire future sequence st:T, using imaginary rollouts of the world model to estimate the novelty. To solve this optimization problem, we use the Dreamer agent, which learns a policy πϕ using a value function and analytic gradients through the model. The policy is learned completely inside the imagination of the world model. During exploration, this imagination training ensures that our exploration policy is always up-to-date with the current world model and collects data that are still novel.

Curiosity-driven exploration behavior

We evaluate Plan2Explore on 20 continuous control tasks from the DeepMind Control Suite. The agent only has access to image observations and no proprioceptive information. Instead of random exploration, which fails to take the agent far from the initial position, Plan2Explore leads to diverse movement strategies like jumping, running, and flipping. Later, we will see that these are effective practice episodes that enable the agent to quickly learn to solve various continuous control tasks.

Solving tasks with the world model

Once an accurate and general world model is learned, we test Plan2Explore on previously unseen tasks. Given a task specified with a reward function, we use the model to optimize a policy for that task. Similar to our exploration procedure, we optimize a new value function and a new policy head for the downstream task. This optimization uses only predictions imagined by the model, enabling Plan2Explore to solve new downstream tasks in a zero-shot manner without any additional interaction with the world.

The following plot shows the performance of Plan2Explore on tasks from DM Control Suite. Before 1 million environment steps, the agent doesn’t know the task and simply explores. The agent solves the task as soon as it is provided at 1 million steps, and keeps improving fast in a few-shot regime after that.

Plan2Explore (—) is able to solve most of the tasks we benchmarked. Since prior work on self-supervised reinforcement learning used model-free agents that are not able to adapt in a zero-shot manner (ICM, —), or did not use image observations, we compare by adapting this prior work to our model-based plan2explore setup. Our latent disagreement objective outperforms other previously proposed objectives. More interestingly, the final performance of Plan2Explore is comparable to the state-of-the-art oracle agent that requires task rewards throughout training (—). In our paper, we further report performance of Plan2Explore in the zero-shot setting where the agent needs to solve the task before any task-oriented practice.

Future directions

Plan2Explore demonstrates that effective behavior can be learned through self-supervised exploration only. This opens multiple avenues for future research:

First, to apply self-supervised RL to a variety of settings, future work will investigate different ways of specifying the task and deriving behavior from the world model. For example, the task could be specified with a demonstration, description of the desired goal state, or communicated to the agent in natural language.
Second, while Plan2Explore is completely self-supervised, in many cases a weak supervision signal is available, such as in hard exploration games, human-in-the-loop learning, or real life. In such a semi-supervised setting, it is interesting to investigate how weak supervision can be used to steer exploration towards the relevant parts of the environment.
Finally, Plan2Explore has the potential to improve the data efficiency of real-world robotic systems, where exploration is costly and time-consuming, and the final task is often unknown in advance.

By designing a scalable way of planning to explore in unstructured environments with visual observations, Plan2Explore provides an important step toward self-supervised intelligent machines.

The post Plan2Explore: Active Model-Building for Self-Supervised Visual Reinforcement Learning appeared first on Artificial Intelligence.

Is AI an Existential Threat?

aiuniverse — Mon, 05 Oct 2020 08:39:43 +0000

Source: unite.ai

When discussing Artificial Intelligence (AI), a common debate is whether AI is an existential threat. The answer requires understanding the technology behind Machine Learning (ML), and recognizing that humans have the tendency to anthropomorphize. We will explore two different types of AI, Artificial Narrow Intelligence (ANI) which is available now and is cause for concern, and the threat which is most commonly associated with apocalyptic renditions of AI which is Artificial General Intelligence (AGI).

Artificial Narrow Intelligence Threats

To understand what ANI is you simply need to understand that every single AI application that is currently available is a form of ANI. These are fields of AI which have a narrow field of specialty, for example autonomous vehicles use AI which is designed with the sole purpose of moving a vehicle from point A to B. Another type of ANI might be a chess program which is optimized to play chess, and even if the chess program continuously improves itself by using reinforcement learning, the chess program will never be able to operate an autonomous vehicle.

With its focus on whatever operation it is responsible for, ANI systems are unable to use generalized learning in order to take over the world. That is the good news; the bad news is that with its reliance on a human operator the AI system is susceptible to biased data, human error, or even worse, a rogue human operator.

AI Surveillance

There may be no greater danger to humanity than humans using AI to invade privacy, and in some cases using AI surveillance to completely prevent people from moving freely. China, Russia, and other nations passed through regulations during COVID-19 to enable them to monitor and control the movement of their respective populations. These are laws which once in place, are difficult to remove, especially in societies that feature autocratic leaders.

In China, cameras are stationed outside of people’s homes, and in some cases inside the person’s home. Each time a member of the household leaves, an AI monitors the time of arrival and departure, and if necessary alerts the authorities. As if that was not sufficient, with the assistance of facial recognition technology, China is able to track the movement of each person every time they are identified by a camera. This offers absolute power to the entity controlling the AI, and absolutely zero recourse to its citizens.

Why this scenario is dangerous, is that corrupt governments can carefully monitor the movements of journalists, political opponents, or anyone who dares to question the authority of the government. It is easy to understand how journalists and citizens would be cautious to criticize governments when every movement is being monitored.

There are fortunately many cities that are fighting to prevent facial recognition from infiltrating their cities. Notably, Portland, Oregon has recently passed a law that blocks facial recognition from being used unnecessarily in the city. While these changes in regulation may have gone unnoticed by the general public, in the future these regulations could be the difference between cities that offer some type of autonomy and freedom, and cities that feel oppressive.

Autonomous Weapons and Drones

Over 4500 AI researches have been calling for a ban on autonomous weapons and have created the Ban Lethal Autonomous Weapons website. The group has many notable non-profits as signatories such as Human Rights Watch, Amnesty International, and the The Future of Life Institute which in itself has a stellar scientific advisory board including Elon Musk, Nick Bostrom, and Stuart Russell.

Before continuing I will share this quote from The Future of Life Institute which best explains why there is clear cause for concern: “In contrast to semi-autonomous weapons that require human oversight to ensure that each target is validated as ethically and legally legitimate, such fully autonomous weapons select and engage targets without human intervention, representing complete automation of lethal harm. ”

Currently, smart bombs are deployed with a target selected by a human, and the bomb then uses AI to plot a course and to land on its target. The problem is what happens when we decide to completely remove the human from the equation?

When an AI chooses what humans need targeting, as well as the type of collateral damage which is deemed acceptable we may have crossed a point of no return. This is why so many AI researchers are opposed to researching anything that is remotely related to autonomous weapons.

There are multiple problems with simply attempting to block autonomous weapons research. The first problem is even if advanced nations such as Canada, the USA, and most of Europe choose to agree to the ban, it doesn’t mean rogue nations such as China, North Korea, Iran, and Russia will play along. The second and bigger problem is that AI research and applications that are designed for use in one field, may be used in a completely unrelated field.

For example, computer vision continuously improves and is important for developing autonomous vehicles, precision medicine, and other important use cases. It is also fundamentally important for regular drones or drones which could be modified to become autonomous. One potential use case of advanced drone technology is developing drones that can monitor and fight forest fires. This would completely remove firefighters from harms way. In order to do this, you would need to build drones that are able to fly into harms way, to navigate in low or zero visibility, and are able to drop water with impeccable precision. It is not a far stretch to then use this identical technology in an autonomous drone that is designed to selectively target humans.

It is a dangerous predicament and at this point in time, no one fully understands the implications of advancing or attempting to block the development of autonomous weapons. It is nonetheless something that we need to keep our eyes on, enhancing whistle blower protection may enable those in the field to report abuses.

Rogue operator aside, what happens if AI bias creeps into AI technology that is designed to be an autonomous weapon?

AI Bias

One of the most unreported threats of AI is AI bias. This is simple to understand as most of it is unintentional. AI bias slips in when an AI reviews data that is fed to it by humans, using pattern recognition from the data that was fed to the AI, the AI incorrectly reaches conclusions which may have negative repercussions on society. For example, an AI that is fed literature from the past century on how to identify medical personnel may reach the unwanted sexist conclusion that women are always nurses, and men are always doctors.

A more dangerous scenario is when AI that is used to sentence convicted criminals is biased towards giving longer prison sentences to minorities. The AI’s criminal risk assessment algorithms are simply studying patterns in the data that has been fed into the system. This data indicates that historically certain minorities are more likely to re-offend, even when this is due to poor datasets which may be influenced by police racial profiling. The biased AI then reinforces negative human policies. This is why AI should be a guideline, never judge and jury.

Returning to autonomous weapons, if we have an AI which is biased against certain ethnic groups, it could choose to target certain individuals based on biased data, and it could go so far as ensuring that any type of collateral damage impacts certain demographics less than others. For example, when targeting a terrorist, before attacking it could wait until the terrorist is surrounded by those who follow the Muslim faith instead of Christians.

Fortunately, it has been proven that AI that is designed with diverse teams are less prone to bias. This is reason enough for enterprises to attempt when at all possible to hire a diverse well-rounded team.

Artificial General Intelligence Threats

It should be stated that while AI is advancing at an exponential pace, we have still not achieved AGI. When we will reach AGI is up for debate, and everyone has a different answer as to a timeline. I personally subscribe to the views of Ray Kurzweil, inventor, futurist, and author of ‘The Singularity is Near” who believes that we will have achieved AGI by 2029.

AGI will be the most transformational technology in the world. Within weeks of AI achieving human-level intelligence, it will then reach superintelligence which is defined as intelligence that far surpasses that of a human.

With this level of intelligence an AGI could quickly absorb all human knowledge and use pattern recognition to identify biomarkers that cause health issues, and then treat those conditions by using data science. It could create nanobots that enter the bloodstream to target cancer cells or other attack vectors. The list of accomplishments an AGI is capable of is infinite. We’ve previously explored some of the benefits of AGI.

The problem is that humans may no longer be able to control the AI. Elon Musk describes it this way: ”With artificial intelligence we are summoning the demon.’ Will we be able to control this demon is the question?

Achieving AGI may simply be impossible until an AI leaves a simulation setting to truly interact in our open-ended world. Self-awareness cannot be designed, instead it is believed that an emergent consciousness is likely to evolve when an AI has a robotic body featuring multiple input streams. These inputs may include tactile stimulation, voice recognition with enhanced natural language understanding, and augmented computer vision.

The advanced AI may be programmed with altruistic motives and want to save the planet. Unfortunately, the AI may use data science, or even a decision tree to arrive at unwanted faulty logic, such as assessing that it is necessary to sterilize humans, or eliminate some of the human population in order to control human overpopulation.

Careful thought and deliberation needs to be explored when building an AI with intelligence that will far surpasses that of a human. There have been many nightmare scenarios which have been explored.

Professor Nick Bostrom in the Paperclip Maximizer argument has argued that a misconfigured AGI if instructed to produce paperclips would simply consume all of earths resources to produce these paperclips. While this seems a little far fetched, a more pragmatic viewpoint is that an AGI could be controlled by a rogue state or a corporation with poor ethics. This entity could train the AGI to maximize profits, and in this case with poor programming and zero remorse it could choose to bankrupt competitors, destroy supply chains, hack the stock market, liquidate bank accounts, or attack political opponents.

This is when we need to remember that humans tend to anthropomorphize. We cannot give the AI human-type emotions, wants, or desires. While there are diabolical humans who kill for pleasure, there is no reason to believe that an AI would be susceptible to this type of behavior. It is inconceivable for humans to even consider how an AI would view the world.

Instead what we need to do is teach AI to always be deferential to a human. The AI should always have a human confirm any changes in settings, and there should always be a fail-safe mechanism. Then again, it has been argued that AI will simply replicate itself in the cloud, and by the time we realize it is self-aware it may be too late.

This is why it is so important to open source as much AI as possible and to have rational discussions regarding these issues.

Summary

There are many challenges to AI, fortunately, we still have many years to collectively figure out the future path that we want AGI to take. We should in the short-term focus on creating a diverse AI workforce, that includes as many women as men, and as many ethnic groups with diverse points of view as possible.

We should also create whistleblower protections for researchers that are working on AI, and we should pass laws and regulations which prevent widespread abuse of state or company-wide surveillance. Humans have a once in a lifetime opportunity to improve the human condition with the assistance of AI, we just need to ensure that we carefully create a societal framework that best enables the positives, while mitigating the negatives which include existential threats.

The post Is AI an Existential Threat? appeared first on Artificial Intelligence.

Artificial intelligence to regulate the level of consciousness?

aiuniverse — Wed, 30 Sep 2020 09:22:09 +0000

Source: genethique.org

Researchers at the Massachusetts Institute of Technology (MIT) and Massachusetts General Hospital recently conducted a study on the possibility of using deep learning techniques ” to monitor the level of unconsciousness in patients who require anesthesia for medical procedure ”. Their document[1] , to be published in the proceedings of the 2020 international conference on artificial intelligence in medicine, was elected best document presented at the conference.

Gabriel Schamberg, one of the researchers who conducted this study, explains that he developed ” a deep neural network ” and ” trained it to control the dosage of the anesthetic using reinforcement learning in a simulated environment ” . In particular, the researchers looked at the dosage of Propofol , ” a drug that lowers the level of consciousness and is commonly used to perform general anesthesia or sedation on patients undergoing medical procedures .”

“ Deep neural networks make it possible to build a model with a lot of continuous input data ,” explains the researcher, “ so that our method generated more coherent control policies than previous policies, ” he says.

The researchers’ long-term goal is to use the model designed to ” help anesthesiologists identify the ideal dose of Propofol for each patient in order to achieve different levels of unconsciousness “. A model which has so far only been tested via simulations. ” We would now like to test [it] on humans in controlled clinical settings, ” said Gabriel Schamberg.

The post Artificial intelligence to regulate the level of consciousness? appeared first on Artificial Intelligence.

Learning optimal mitigation strategies through agent based reinforcement learning

aiuniverse — Tue, 29 Sep 2020 07:56:18 +0000

Source: fields.utoronto.ca

Covid-19 has resulted in the mathematical modeling community to come together to produce a wide range of analyses, forecasting and scenario modeling efforts. In our work, we propose a novel approach by using Reinforcement Learning (RL) to answer the question: “What agent behaviours reduce the spread of Covid-19?”. This is done through creating an agent based simulation environment and modeling the environment as a multi-agent Markov Decision Process (MDP). By providing the freedom of agents to select their own actions and learn from their experiences, this Machine Learning approach allows agents to learn behavioural policies that reduce the spread of Covid-19. These behaviours can be mined and conditioned on demographic attributes for analysis, with the hopes of providing a more granular analysis to inform public health policy makers. Our environment is built using open data from Statistics Canada (census, surveys) and can be modified to a particular country or region. In this presentation we cover the mathematical framework of MDPs, discuss the agent environment, data sources and analyze the results both in terms of what behaviours the agents learn, but also the reduction in spread of Covid-19 over various baselines. We discuss the generality of our approach and how it can be modified as our understanding of the infection changes.

This work is a collaborative effort between different divisions within Statistics Canada. Nicholas Denis, Blair Drummond, Alex El-Hajj and Krishna Gopaluni are data scientists/data engineers within the Data Science Division (DScD) of Statistics Canada. The DScD provides modern data science solutions to clients using cutting edge machine learning techniques. Yamina Abiza is a member of IT Operations Data Science and Data Engineering Service, Statistics Canada, which works together with data scientists to provide platform solutions and tools combined with efficient and robust development/data engineering skills to advance data science objectives at velocity. Deirdre Hennessey is a member of the Health Analysis Division, Statistics Canada, providing high quality, relevant, and comprehensive information on the health status of the population and on the health care system.

The post Learning optimal mitigation strategies through agent based reinforcement learning appeared first on Artificial Intelligence.

Robotic manipulation research: From the laboratory to the real world

aiuniverse — Mon, 28 Sep 2020 09:10:57 +0000

Source: openaccessgovernment.org

Designing robots to perform human tasks is one of the biggest challenges in robotics. Robotics researchers have attempted to solve problems of this sort for many years; however, only a small number of human tasks can be performed by robots thus far. This fact may seem strange, as humans are able to perform most manual tasks in everyday life without any difficulties. Humans are aware of the difficulty of tasks that require delicate force adjustments or fine position adjustments, and humans tend to think that the easy tasks for a human to perform are also easy for robots.

In the homunculus diagram compiled by Penfield, which shows the correspondence between the motor and somatosensory cortices and body parts, the areas of the brain related to fingers and palms occupy one-third of the motor cortex and one-quarter of the sensory cortex. Hence, human hands are often called the second brain. This fact suggests that the functions of hands and fingers, which we realize without thinking, are based on the enormous amount of tacit knowledge stored in the brain. In other words, the key to robotic manipulation research is determining how to allow robots to utilize the tacit knowledge stored in the brain. Studying the functions of robotic hands and fingers is a profound problem directly related to the understanding of human intelligence.

This paper will describe the robotic manipulation research mainly done in the author’s laboratory and its applications. As mentioned above, research on robotic manipulation is academically interesting, but at the same time, it is increasingly relevant to many fields of our daily life and industry.

Academic research

Fig. 1 shows the relationship between the research items necessary for the automatic generation of robotic manipulation motions. One intuitive approach is to apply robotic motion planning. In this figure, a three-layered structure for robotic motion planning is shown in the center. The top level plans the task procedures; for example, if the robot is instructed to perform the target task, this layer plans a sequence of motion to realize the task. Next, the middle layer plans the motion of robotic manipulation to accomplish the target task. Then, the lowest layer plans the posture of the hand to grasp the target object to complete the required task. This three-layered motion planning is called integrated task and motion planning (TAMP). To date, various studies have been conducted using TAMP to automatically generate robotic manipulation motions.1, 2

However, the tasks that can be realized by applying only TAMP are limited. For example, a human could cook a meal by referring to food recipes. A robot must also complete the required task when given human-understandable work instructions, such as food recipes and verbal instructions. In this case, a robot must be equipped with tacit knowledge, like that possessed by humans, and must adaptively employ information not written in the work instructions to carry out the task.

Secondly, we rarely have accurate geometrical information about the surrounding environment or the grasped object’s physical parameters. To compensate for this missing or inaccurate information, we need to provide sensor feedback. We must plan a robot’s motion to compensate for such missing or inaccurate information effectively, and the task must be performed by taking this limitation into account. It is also often necessary to design a specially coordinated hand for each task, and to learn movements from human demonstrations. In addition, although it would be desirable if the robot performed all tasks by equipping a single universal hand attached at the arm tip, this is not always possible in practice. In many cases, we have to carefully design robotic hands that correspond to specific tasks and are unsuited to others.

Fig. 2 shows an example of robotic motion planning where an elastic ring-shaped object is installed in a cylinder using a dual-arm industrial manipulator.3 We planned a sequence of motion for two arms while minimizing the ring’s elastic energy. First, the left hand moves to the target position while the right hand remains stationary. Then, the right hand moves to the target position while the left hand remains stationary. After iterating this operation for a few steps, the robot can install the ring-shaped object in the cylinder.

In contrast, Fig. 3 shows an example of understanding a human-performed assembly task for the purpose of transferring it to a robot.4 This figure shows a situation where a human assembles a toy airplane. By capturing a human’s motion while assembling a product, we try to identify what kind of task the human is performing. However, because humans can perform a variety of tasks, we must determine the task being performed from a large number of candidates. To avoid this problem, we first identify the grasped object and then identify the task associated with the grasped object.

Real-world applications

This section discusses the real-world applications of robotic manipulation research. One of the foremost areas where in which human tasks are expected to be performed by robots is the field of factory manufacturing processes. As depicted in “Modern Times” and other films, factory manufacturing is harsh on humans even today, with modern science and technology. This fact is especially noticeable in the assembly and inspection processes for high-mix low-volume manufacturing. Robotization of such a high-mix low-volume manufacturing process requires frequent task changes, which makes robotization difficult. Attempts have been made to introduce AI to industrial robots to solve these problems. Fig. 4 shows one such example of camera-lens assembly.5 By using reinforcement learning, we can effectively obtain the trajectory of the parts as well as the force control parameters.

Robotic manipulation research is also expected to impact our daily lives. Many countries, including Japan, are experiencing a rapid decline in birth rate and an aging population. The shortage of caregivers for the elderly and physically challenged has become a social problem. At this time, daily life assistance for tasks such as feeding, dressing, and bathing is very hard work, and it is hoped that robots will be able to replace humans to perform these functions. Moreover, elderly and physically challenged people in nursing homes are provided with individualized meals, and it is expected that the production and serving of these meals will also be automated. The robotic manipulation technology we are studying is critically important in order to automate these tasks.

Conclusion

In this article, we discussed robotic manipulation research from both academic and practical points of view. Robotic manipulation is a key technology for introducing robots into certain fields. With recent advancements in AI technology, robotic manipulation has also made significant advancements. We are working hard to introduce robots in many fields through the advancement of robotic manipulation technology.

The post Robotic manipulation research: From the laboratory to the real world appeared first on Artificial Intelligence.

Engineers Develop New Machine-Learning Method Capable of Cutting Energy Use

aiuniverse — Mon, 28 Sep 2020 07:32:34 +0000

Source:unite.ai

Engineers at Swiss Center for Electronics and Microtechnology have developed a new machine-learning method capable of cutting energy use, as well as allowing artificial intelligence (AI) to complete tasks that were once considered too sensitive.

Reinforcement Learning Limitations

Reinforcement learning, where a computer continuously improves upon itself by learning from its past experiences, is a major aspect of artificial intelligence. However, this technology is oftentimes difficult to apply to real-life scenarios and situations, such as training climate-control systems. Applications such as this are not able to deal with drastic changes in temperatures, which would be brought on by reinforcement learning.

This exact issue is what the CSEM engineers set out to address, and that is when they came up with the new approach. The engineers demonstrated that simplified theoretical models could first be used to train computers, and then they would turn to real-life systems. This allows for the machine learning process to be more accurate by the time it reaches the real-life system, learning from its previous trial-and-errors with the theoretical model. This means that there will be no drastic fluctuations for the real-life system, solving the example issue with climate-control technology.

Pierre-Jean Alet is head of smart energy systems research at CSEM, as well as co-author of the study.

“It’s like learning the driver’s manual before you start a car,” Alet says. “With this pre-training step, computers build up a knowledge base they can draw on so they aren’t flying blind as they search for the right answer.”

Energy Cuts

One of the most important aspects of this new method is that it can cut energy use by over 20%. The engineers tested the method on a heating, ventilation and air conditioning (HVAC) system, which was located in a 100-room building.

The engineers relied on three steps, the first of which was training a computer on a “virtual mode.” This model was constructed through simple equations explaining the behavior of the building. Real building data such as temperature, weather conditions and other variables were then fed to the computer, which resulted in more accurate training. The last step was to allow the computer to run the reinforcement learning algorithms, which would eventually result in the best approach forward for the HVAC system.

The new method developed by the CSEM engineers could have big implications for machine learning. Many applications that were once thought to be “untouchable” by reinforcement learning, like those with large fluctuations, could now be approached in a new manner. This would result in lower energy usage, lower financial costs and many other benefits.

The research was published in the journal IEEE Transactions on Neural Networks and Learning Systems, titled “A hybrid learning method for system identification and optimal control.”

The authors include: Baptiste Schubnel, Rafael E. Carrillo, Pierre-Jean Alet and Andreas Hutter.

The post Engineers Develop New Machine-Learning Method Capable of Cutting Energy Use appeared first on Artificial Intelligence.