<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Reinforcement Learning Archives - Artificial Intelligence</title>
	<atom:link href="https://www.aiuniverse.xyz/tag/reinforcement-learning/feed/" rel="self" type="application/rss+xml" />
	<link>https://www.aiuniverse.xyz/tag/reinforcement-learning/</link>
	<description>Exploring the universe of Intelligence</description>
	<lastBuildDate>Wed, 18 Nov 2020 05:28:21 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=7.0</generator>
	<item>
		<title>DeepMind open-sources Lab2D to support creation of 2D environments for AI and machine learning</title>
		<link>https://www.aiuniverse.xyz/deepmind-open-sources-lab2d-to-support-creation-of-2d-environments-for-ai-and-machine-learning/</link>
					<comments>https://www.aiuniverse.xyz/deepmind-open-sources-lab2d-to-support-creation-of-2d-environments-for-ai-and-machine-learning/#respond</comments>
		
		<dc:creator><![CDATA[aiuniverse]]></dc:creator>
		<pubDate>Wed, 18 Nov 2020 05:28:19 +0000</pubDate>
				<category><![CDATA[Reinforcement Learning]]></category>
		<category><![CDATA[2D environments]]></category>
		<category><![CDATA[AI]]></category>
		<category><![CDATA[DeepMind]]></category>
		<category><![CDATA[Lab2D]]></category>
		<category><![CDATA[Machine learning]]></category>
		<guid isPermaLink="false">http://www.aiuniverse.xyz/?p=12371</guid>

					<description><![CDATA[<p>Source: computing.co.uk Alphabet subsidiary DeepMind announced on Monday that it has open-sourced Lab2D, a scalable environment simulator for artificial intelligence (AI) research that facilitates researcher-led experimentation with environment <a class="read-more-link" href="https://www.aiuniverse.xyz/deepmind-open-sources-lab2d-to-support-creation-of-2d-environments-for-ai-and-machine-learning/">Read More</a></p>
<p>The post <a href="https://www.aiuniverse.xyz/deepmind-open-sources-lab2d-to-support-creation-of-2d-environments-for-ai-and-machine-learning/">DeepMind open-sources Lab2D to support creation of 2D environments for AI and machine learning</a> appeared first on <a href="https://www.aiuniverse.xyz">Artificial Intelligence</a>.</p>
]]></description>
										<content:encoded><![CDATA[
<p class="wp-block-paragraph">Source: computing.co.uk</p>



<p class="wp-block-paragraph">Alphabet subsidiary DeepMind announced on Monday that it has open-sourced Lab2D, a scalable environment simulator for artificial intelligence (AI) research that facilitates researcher-led experimentation with environment design.</p>



<p class="wp-block-paragraph">DeepMind describes Lab2D as a system designed to support creation of two-dimensional (2D) layered, discrete &#8220;grid-world&#8221; environments, in which pieces move around in the same way as chess pieces move around on a chess board.</p>



<p class="wp-block-paragraph">The system is particularly tailored for multi-agent reinforcement learning, according to Lab2D researchers.</p>



<p class="wp-block-paragraph">The computationally intensive engine for Lab2D is written in C++ for efficiency, while most of the level-specific logic is written in Lua.</p>



<p class="wp-block-paragraph">&#8220;The environments are &#8216;grid worlds&#8217;, which are defined with a combination of simple text-based maps for the layout of the world, and Lua code for its behaviour,&#8221; the researchers state in their study paper.</p>



<p class="wp-block-paragraph">&#8220;Machine learning agents interact with these environments through one of two APIs, the Python <em>dm_env</em> API or a custom <em>C</em> API (which is also used by DeepMind Lab).&#8221;</p>



<p class="wp-block-paragraph">The researchers note that in the rush to create artificial general intelligence which will work in any environment, &#8216;tinkering&#8217; with environmental variables has become unfashionable. Nevertheless, in real-world use cases simulated environments are essential to discover how systems based on reinforcement learning develop an understanding of the conditions in which they operate.</p>



<p class="wp-block-paragraph">They make the case that 2D environments are inherently easier to understand than three-dimensional ones, at very little, if any, loss of expressiveness, and are more performant and easier to use.</p>



<p class="wp-block-paragraph">&#8220;Rich complexity along numerous dimensions can be studied in 2D just as readily as in 3D, if not more so.&#8221;</p>



<p class="wp-block-paragraph">They note that 2D worlds have been successfully used to study problems as diverse as navigation, social complexity, imperfect information, and abstract reasoning.</p>



<p class="wp-block-paragraph">&#8220;2D worlds can often capture the relevant complexity of the problem at hand without the need for continuous-time physical environments.&#8221;</p>



<p class="wp-block-paragraph">Another advantage of 2D worlds is that they are easier to program and design than their 3D counterparts. This has been particularly noticed when the 3D world actually exploits the space or physical dynamics beyond the capabilities of 2D ones.</p>



<p class="wp-block-paragraph">Moreover, 2D worlds do not need complex 3D assets to be evocative, or any reasoning about lighting, shaders, and projections.</p>



<p class="wp-block-paragraph">The decision to open-source Lab2D comes after DeepMind released OpenSpiel, a reinforcement learning framework for video games, designed to&nbsp; &#8220;promote general multi-agent reinforcement learning across many different game types, in a similar way as general game-playing but with a heavy emphasis on learning and not in competition form.&#8221;</p>



<p class="wp-block-paragraph">Lab2D seeks to build on this work by providing a means to study how agents learn.</p>



<p class="wp-block-paragraph">&#8220;We think that progress toward artificial general intelligence requires robust simulation platforms to enable in silico exploration of agent learning, skill acquisition, and careful measurement. We hope that the system we introduce here, DeepMind Lab2D, can fill this role.&#8221;</p>
<p>The post <a href="https://www.aiuniverse.xyz/deepmind-open-sources-lab2d-to-support-creation-of-2d-environments-for-ai-and-machine-learning/">DeepMind open-sources Lab2D to support creation of 2D environments for AI and machine learning</a> appeared first on <a href="https://www.aiuniverse.xyz">Artificial Intelligence</a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://www.aiuniverse.xyz/deepmind-open-sources-lab2d-to-support-creation-of-2d-environments-for-ai-and-machine-learning/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>Researchers detail LaND, AI that learns from autonomous vehicle disengagements</title>
		<link>https://www.aiuniverse.xyz/researchers-detail-land-ai-that-learns-from-autonomous-vehicle-disengagements/</link>
					<comments>https://www.aiuniverse.xyz/researchers-detail-land-ai-that-learns-from-autonomous-vehicle-disengagements/#respond</comments>
		
		<dc:creator><![CDATA[aiuniverse]]></dc:creator>
		<pubDate>Sat, 17 Oct 2020 06:20:15 +0000</pubDate>
				<category><![CDATA[Reinforcement Learning]]></category>
		<category><![CDATA[AI]]></category>
		<category><![CDATA[AI researchers]]></category>
		<category><![CDATA[autonomous]]></category>
		<category><![CDATA[researchers]]></category>
		<category><![CDATA[vehicle]]></category>
		<guid isPermaLink="false">http://www.aiuniverse.xyz/?p=12297</guid>

					<description><![CDATA[<p>Source: venturebeat.com UC Berkeley AI researchers say they’ve created AI for autonomous vehicles driving in unseen, real-world landscapes that outperforms leading methods for delivery robots driving on <a class="read-more-link" href="https://www.aiuniverse.xyz/researchers-detail-land-ai-that-learns-from-autonomous-vehicle-disengagements/">Read More</a></p>
<p>The post <a href="https://www.aiuniverse.xyz/researchers-detail-land-ai-that-learns-from-autonomous-vehicle-disengagements/">Researchers detail LaND, AI that learns from autonomous vehicle disengagements</a> appeared first on <a href="https://www.aiuniverse.xyz">Artificial Intelligence</a>.</p>
]]></description>
										<content:encoded><![CDATA[
<p class="wp-block-paragraph">Source: venturebeat.com</p>



<p class="wp-block-paragraph">UC Berkeley AI researchers say they’ve created AI for autonomous vehicles driving in unseen, real-world landscapes that outperforms leading methods for delivery robots driving on sidewalks. Called LaND, for Learning to Navigate from Disengagements, the navigation system studies disengagement events, then predicts when disengagements will happen in the future. The approach is meant to provide what the researchers call a needed shift in perspective about disengagements for the AI community.</p>



<p class="wp-block-paragraph">A disengagement describes each instance when an autonomous system encounters challenging conditions and must turn control back over to a human operator. Disengagement events are a contested, and some say outdated, metric for measuring the capabilities of an autonomous vehicle system. AI researchers often treat disengagements as a signal for troubleshooting or debugging navigation systems for delivery robots on sidewalks or autonomous vehicles on roads, but LaND treats disengagements as part of training data.</p>



<p class="wp-block-paragraph">Doing so, according to engineers from Berkeley AI Research, allows the robot to learn from datasets collected naturally during the testing process. Other systems have learned directly from training data gathered from onboard sensors, but researchers say that can require a lot of labeled data and be expensive.</p>



<p class="wp-block-paragraph">“Our results demonstrate LaND can successfully learn to navigate in diverse, real world sidewalk environments, outperforming both imitation learning and reinforcement learning approaches,” the paper reads. “Our key insight is that if the robot can successfully learn to execute actions that avoid disengagement, then the robot will successfully perform the desired task. Crucially, unlike conventional reinforcement learning algorithms, which use task-specific reward functions, our approach does not even need to know the task — the task is specified implicitly through the disengagement signal. However, similar to standard reinforcement learning algorithms, our approach continuously improves because our learning algorithm reinforces actions that avoid disengagements.”</p>



<p class="wp-block-paragraph">LaND utilizes reinforcement learning, but rather than seek a reward, each disengagement event is treated as a way to learn directly from input sensors like a camera while taking into account factors like steering angle and whether autonomy mode was engaged. The researchers detailed LaND in a paper and code published last week on preprint repository arXiv.</p>



<p class="wp-block-paragraph">The team collected training data to build LaND by driving a Clearpath Jackal robot on the sidewalks of Berkeley. A human safety driver escorted the robot to reset its course or take over driving for a short period if the robot drove into a street, driveway, or other obstacle. In all, nearly 35,000 data points were collected and nearly 2,000 disengagements were produced during the LaND training on Berkeley sidewalks. Delivery robot startup Kiwibot also operates at UC Berkeley and on nearby sidewalks.</p>



<p class="wp-block-paragraph">Compared with a deep reinforcement learning algorithm (Kendall et al.) and behavioral cloning, a common method of imitation learning, initial experiments showed that LaND traveled longer distances on sidewalks before disengaging.</p>



<p class="wp-block-paragraph">In future work, authors say LaND can be combined with existing navigation systems, particularly leading imitation learning methods that use data from experts for improved results. Investigating ways to have the robot alert its handlers when it needs human monitoring could lower costs.</p>



<p class="wp-block-paragraph">In other recent work focused on keeping training costs down for robotic systems, in August a group of UC Berkeley AI researchers created a simple method for training grasping systems that uses a $18 reacher-grabber and GoPro to collect training data for robotic grasping systems. Last year, Berkeley researchers including Pieter Abbeel, a coauthor of LaND research, introduced Blue, a general purpose robot that costs a fraction of existing robot systems.</p>
<p>The post <a href="https://www.aiuniverse.xyz/researchers-detail-land-ai-that-learns-from-autonomous-vehicle-disengagements/">Researchers detail LaND, AI that learns from autonomous vehicle disengagements</a> appeared first on <a href="https://www.aiuniverse.xyz">Artificial Intelligence</a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://www.aiuniverse.xyz/researchers-detail-land-ai-that-learns-from-autonomous-vehicle-disengagements/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>Researchers Develop ‘ActionBrain’ that Enables Self-thinking Robots</title>
		<link>https://www.aiuniverse.xyz/researchers-develop-actionbrain-that-enables-self-thinking-robots/</link>
					<comments>https://www.aiuniverse.xyz/researchers-develop-actionbrain-that-enables-self-thinking-robots/#respond</comments>
		
		<dc:creator><![CDATA[aiuniverse]]></dc:creator>
		<pubDate>Fri, 16 Oct 2020 06:49:23 +0000</pubDate>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[ActionBrain]]></category>
		<category><![CDATA[AI]]></category>
		<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[Reinforcement Learning]]></category>
		<category><![CDATA[Robots]]></category>
		<category><![CDATA[Technology]]></category>
		<guid isPermaLink="false">http://www.aiuniverse.xyz/?p=12266</guid>

					<description><![CDATA[<p>Source: koreabizwire.com South Korean researchers have developed a new artificial intelligence (AI) technology that enables robots to complete their assigned missions by making independent decisions on what <a class="read-more-link" href="https://www.aiuniverse.xyz/researchers-develop-actionbrain-that-enables-self-thinking-robots/">Read More</a></p>
<p>The post <a href="https://www.aiuniverse.xyz/researchers-develop-actionbrain-that-enables-self-thinking-robots/">Researchers Develop ‘ActionBrain’ that Enables Self-thinking Robots</a> appeared first on <a href="https://www.aiuniverse.xyz">Artificial Intelligence</a>.</p>
]]></description>
										<content:encoded><![CDATA[
<p class="wp-block-paragraph">Source: koreabizwire.com</p>



<p class="wp-block-paragraph">South Korean researchers have developed a new artificial intelligence (AI) technology that enables robots to complete their assigned missions by making independent decisions on what actions to take.</p>



<p class="wp-block-paragraph">The state-run Electronics and Telecommunications Research Institute (ETRI) announced on Tuesday that it had developed ‘ActionBrain’ technology that clears the way for robots to make their own decisions on whether to take independent or collaborative actions to carry out their missions.</p>



<p class="wp-block-paragraph">In Internet of Things (IoT) ecosystems, the way things operate, in general, is based on the rules pre-set by developers.</p>



<p class="wp-block-paragraph">This way of operation, however, has limits particularly when an unexpected situation occurs or when the environment dramatically changes, with devices having difficulty adapting to new situations.</p>



<p class="wp-block-paragraph">Against this backdrop, the ETRI developed the ActionBrain technology by applying deep learning technologies such as imitation learning and reinforcement learning.</p>



<p class="wp-block-paragraph">If this technology is applied to production robots at smart factories, they can develop so-called behavior intelligence for collaborative production, while being able to communicate with other robots.</p>



<p class="wp-block-paragraph">Even when the actual working conditions are changed from pre-set conditions, the robots can self-correct to ensure optimal operation.</p>



<p class="wp-block-paragraph">This technology can also be applied to unmanned surveillance drones that can be used to monitor natural disasters or other emergencies.</p>
<p>The post <a href="https://www.aiuniverse.xyz/researchers-develop-actionbrain-that-enables-self-thinking-robots/">Researchers Develop ‘ActionBrain’ that Enables Self-thinking Robots</a> appeared first on <a href="https://www.aiuniverse.xyz">Artificial Intelligence</a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://www.aiuniverse.xyz/researchers-develop-actionbrain-that-enables-self-thinking-robots/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>HCMUS wins AI contest for technological engineers</title>
		<link>https://www.aiuniverse.xyz/hcmus-wins-ai-contest-for-technological-engineers/</link>
					<comments>https://www.aiuniverse.xyz/hcmus-wins-ai-contest-for-technological-engineers/#respond</comments>
		
		<dc:creator><![CDATA[aiuniverse]]></dc:creator>
		<pubDate>Fri, 09 Oct 2020 06:05:43 +0000</pubDate>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[AI]]></category>
		<category><![CDATA[HCMUS]]></category>
		<category><![CDATA[Reinforcement Learning]]></category>
		<category><![CDATA[technological engineers]]></category>
		<guid isPermaLink="false">http://www.aiuniverse.xyz/?p=12071</guid>

					<description><![CDATA[<p>Source: sggpnews.org.vn This is the first time in Vietnam a simulation contest combining AI algorithm with Esport has been held. The Arena, named ‘Treasure Island’, follows the <a class="read-more-link" href="https://www.aiuniverse.xyz/hcmus-wins-ai-contest-for-technological-engineers/">Read More</a></p>
<p>The post <a href="https://www.aiuniverse.xyz/hcmus-wins-ai-contest-for-technological-engineers/">HCMUS wins AI contest for technological engineers</a> appeared first on <a href="https://www.aiuniverse.xyz">Artificial Intelligence</a>.</p>
]]></description>
										<content:encoded><![CDATA[
<p class="wp-block-paragraph">Source: sggpnews.org.vn</p>



<p class="wp-block-paragraph">This is the first time in Vietnam a simulation contest combining AI algorithm with Esport has been held. The Arena, named ‘Treasure Island’, follows the one-on-one combat model to select 8 most prominent teams. These competitors then went through 15 final rounds to identify the winner.</p>



<p class="wp-block-paragraph">Among the contestants for the final match, 4 created a virtual agent based on Reinforcement Learning (RL) algorithm, while the rest employed non-AI-related algorithms.</p>



<p class="wp-block-paragraph">Experts of the field commented that the contest was a fight between AI and real humans. Most importantly, all teams needed to program their agent so that it could self-learn and make a strategic decision on its own. The agent also had to upgrade its brain when observing opponents’ moves in order to beat them and win the game.</p>



<p class="wp-block-paragraph">At last, Black Panther from HCMUS excellently surpassed other contestants and was named the champion of the arena. They received a cash prize of VND100 million (approx. US$431,000) and will attend an AI workshop worth VND20 million ($862) held by FPT Software.</p>



<p class="wp-block-paragraph">The second position belongs to Trusted-AI, who used RL algorithm to train their agent to learn from experience for best logical moves based on a specific environment.</p>



<p class="wp-block-paragraph">Besides the main prizes, the organization board also delivered awards to the Most Talented and the Most Promising, worth VND20 million ($862) and VND10 million ($431) respectively. These winners are the teams with all students.</p>



<p class="wp-block-paragraph">Mr. Nguyen Do Van from AI Academy Vietnam, one of the judges in the contest, shared his hope that in the future, such a competition is able to identify more inspiring, innovative algorithms to tackle current negative issues in the community.</p>



<p class="wp-block-paragraph">The contest was held from August to September 2020, attracting 445 teams of nearly 1,000 people from all over the nation and other countries like Japan, the Republic of Korea, Germany, and the US.</p>
<p>The post <a href="https://www.aiuniverse.xyz/hcmus-wins-ai-contest-for-technological-engineers/">HCMUS wins AI contest for technological engineers</a> appeared first on <a href="https://www.aiuniverse.xyz">Artificial Intelligence</a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://www.aiuniverse.xyz/hcmus-wins-ai-contest-for-technological-engineers/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>Google Teases Large Scale Reinforcement Learning Infrastructurean</title>
		<link>https://www.aiuniverse.xyz/google-teases-large-scale-reinforcement-learning-infrastructurean/</link>
					<comments>https://www.aiuniverse.xyz/google-teases-large-scale-reinforcement-learning-infrastructurean/#respond</comments>
		
		<dc:creator><![CDATA[aiuniverse]]></dc:creator>
		<pubDate>Wed, 07 Oct 2020 06:40:44 +0000</pubDate>
				<category><![CDATA[Reinforcement Learning]]></category>
		<category><![CDATA[applications]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[training]]></category>
		<guid isPermaLink="false">http://www.aiuniverse.xyz/?p=11998</guid>

					<description><![CDATA[<p>Source: alyticsindiamag.com The current state-of-the-art reinforcement learning techniques require many iterations over many samples from the environment to learn a target task. For instance, the game Dota <a class="read-more-link" href="https://www.aiuniverse.xyz/google-teases-large-scale-reinforcement-learning-infrastructurean/">Read More</a></p>
<p>The post <a href="https://www.aiuniverse.xyz/google-teases-large-scale-reinforcement-learning-infrastructurean/">Google Teases Large Scale Reinforcement Learning Infrastructurean</a> appeared first on <a href="https://www.aiuniverse.xyz">Artificial Intelligence</a>.</p>
]]></description>
										<content:encoded><![CDATA[
<p class="wp-block-paragraph">Source: alyticsindiamag.com</p>



<p class="wp-block-paragraph">The current state-of-the-art reinforcement learning techniques require many iterations over many samples from the environment to learn a target task. For instance, the game Dota 2 learns from batches of 2 million frames every 2 seconds. The infrastructure that handles RL at this scale should be not only good at collecting a large number of samples, but also be able to quickly iterate over these extensive amounts of samples during training. To be efficient requires to overcome a few common challenges:</p>



<p class="wp-block-paragraph">Should service a large number of read requests from actors to a learner for model retrieval as the number of actors increases.<br>The processor performance is often restricted by the efficiency of the input pipeline in feeding the training data to the compute cores.<br>As the number of computing cores increases, the performance of the input pipeline becomes even more critical for the overall training runtime.<br>So, Google has now introduced Menger, a massive large-scale distributed reinforcement learning infrastructure with localised inference. This can also scale up to several thousand actors across multiple processing clusters reducing the overall training time in the task of chip placement. Chip placement or chip floor design is time-consuming and manual. Earlier this year, Google demonstrated how the problem of chip placement could be solved through the lens of deep reinforcement learning and bring down the time of designing a chip.</p>



<p class="wp-block-paragraph">With Menger, Google tested the scalability and efficiency through TPU accelerators on-chip placement tasks.</p>



<h3 class="wp-block-heading"><strong>How It Works</strong></h3>



<p class="wp-block-paragraph">The above illustration is an overview of a distributed RL system with multiple actors placed in different Borg cells. Google’s Borg system, introduced in 2015, is a cluster manager that runs thousands of jobs, from many thousands of different applications, across tens of thousands of machines. With increasing updates from multiple actors within an environment, the communication between learner and actors is throttled, and this leads to an increase in convergence time.</p>



<p class="wp-block-paragraph">The main responsibility here, wrote the researchers, is maintaining a balance between a large number of requests from actors and the learner job. They also state that adding caching components not only reduces the pressure on the learner to service the read requests but also further distributes the actors across multiple Borg cells. This, in turn, reduces computation overhead.</p>



<p class="wp-block-paragraph">Menger uses Reverb, an open-sourced data storage system designed to implement experience replay in a variety of on-policy/off-policy algorithms for machine learning applications that provides an efficient and flexible platform. Reverb’s sharding helped balance the load from a large number of actors across multiple servers, instead of throttling a single replay buffer server while minimising the latency for each replay buffer server. However, the researchers also state that using a single Reverb replay buffer service does not cut the job. It doesn’t scale well in a distributed RL setting with multiple actors. It becomes inefficient with multiple actors.</p>



<p class="wp-block-paragraph">The researchers claim that they have successfully used Menger infrastructure to drastically reduce the training time.</p>



<h4 class="wp-block-heading">Key Takeaways</h4>



<p class="wp-block-paragraph">Reinforcement learning applications have slowly found themselves in unexpected domains. But, implementing RL techniques is tricky. The performance accuracy trade-off looms large in research. With Menger, the researchers have tried to answer the shortcomings of RL infrastructure. However, its promising results in the intricate task of chip placement has the potential to shorten the chip design cycle and other challenging real-world tasks as well.</p>



<p class="wp-block-paragraph">Reduces the average read latency by a factor of ~4.0x, leading to faster training iterations, especially for on-policy algorithms.<br>Efficient scaling of Menger is due to the sharding capability of Reverb.<br>The training time was reduced from ~8.6 hours down to merely one hour compared to the state-of-the-art.</p>
<p>The post <a href="https://www.aiuniverse.xyz/google-teases-large-scale-reinforcement-learning-infrastructurean/">Google Teases Large Scale Reinforcement Learning Infrastructurean</a> appeared first on <a href="https://www.aiuniverse.xyz">Artificial Intelligence</a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://www.aiuniverse.xyz/google-teases-large-scale-reinforcement-learning-infrastructurean/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>Plan2Explore: Active Model-Building for Self-Supervised Visual Reinforcement Learning</title>
		<link>https://www.aiuniverse.xyz/plan2explore-active-model-building-for-self-supervised-visual-reinforcement-learning/</link>
					<comments>https://www.aiuniverse.xyz/plan2explore-active-model-building-for-self-supervised-visual-reinforcement-learning/#respond</comments>
		
		<dc:creator><![CDATA[aiuniverse]]></dc:creator>
		<pubDate>Tue, 06 Oct 2020 08:23:22 +0000</pubDate>
				<category><![CDATA[Reinforcement Learning]]></category>
		<category><![CDATA[DeepMind Control Suite]]></category>
		<category><![CDATA[Future]]></category>
		<category><![CDATA[Plan2Explore]]></category>
		<guid isPermaLink="false">http://www.aiuniverse.xyz/?p=11972</guid>

					<description><![CDATA[<p>Source: bair.berkeley.edu To operate successfully in unstructured open-world environments, autonomous intelligent agents need to solve many different tasks and learn new tasks quickly. Reinforcement learning has enabled <a class="read-more-link" href="https://www.aiuniverse.xyz/plan2explore-active-model-building-for-self-supervised-visual-reinforcement-learning/">Read More</a></p>
<p>The post <a href="https://www.aiuniverse.xyz/plan2explore-active-model-building-for-self-supervised-visual-reinforcement-learning/">Plan2Explore: Active Model-Building for Self-Supervised Visual Reinforcement Learning</a> appeared first on <a href="https://www.aiuniverse.xyz">Artificial Intelligence</a>.</p>
]]></description>
										<content:encoded><![CDATA[
<p class="wp-block-paragraph">Source: bair.berkeley.edu</p>



<p class="wp-block-paragraph">To operate successfully in unstructured open-world environments, autonomous intelligent agents need to solve many different tasks and learn new tasks quickly. Reinforcement learning has enabled artificial agents to solve complex tasks both in simulation and real-world. However, it requires collecting large amounts of experience in the environment for each individual task. Self-supervised reinforcement learning has emerged as an alternative, where the agent only follows an intrinsic objective that is independent of any individual task, analogously to unsupervised representation learning. After acquiring general and reusable knowledge about the environment through self-supervision, the agent can adapt to specific downstream tasks more efficiently.</p>



<p class="wp-block-paragraph">In this post, we explain our recent publication that develops Plan2Explore. While many recent papers on self-supervised reinforcement learning have focused on model-free agents, our agent learns an internal world model that predicts the future outcomes of potential actions. The world model captures general knowledge, allowing Plan2Explore to quickly solve new tasks through planning in its own imagination. The world model further enables the agent to explore what it expects to be novel, rather than repeating what it found novel in the past. Plan2Explore obtains state-of-the-art zero-shot and few-shot performance on continuous control benchmarks with high-dimensional input images. To make it easy to experiment with our agent, we are open-sourcing the complete source code .</p>



<h3 class="wp-block-heading" id="how-does-plan2explore-work">How does Plan2Explore work?</h3>



<p class="wp-block-paragraph">At a high level, Plan2Explore works by training a world model, exploring to maximize the information gain for the world model, and using the world model at test time to solve new tasks (see figure above). Thanks to effective exploration, the learned world model is general and captures information that can be used to solve multiple new tasks with no or few additional environment interactions. We discuss each part of the Plan2Explore algorithm individually below. We assume a basic understanding of reinforcement learning in this post and otherwise recommend these materials as an introduction.</p>



<h3 class="wp-block-heading" id="learning-the-world-model">Learning the world model</h3>



<p class="wp-block-paragraph">Plan2Explore learns a world model that predicts future outcomes given past observations o1:t and actions a1:t (see figure below). To handle high-dimensional image observations, we encode them into lower-dimensional features h and use an RSSM model that predicts forward in a compact latent state-space s, from which the observations can be decoded. The latent state aggregates information from past observations that is helpful for future prediction, and is learned end-to-end using a variational objective.</p>



<h3 class="wp-block-heading" id="a-novelty-metric-for-active-model-building">A novelty metric for active model-building</h3>



<p class="wp-block-paragraph">To learn an accurate and general world model we need an exploration strategy that collects new and informative data. To achieve this, Plan2Explore uses a novelty metric derived from the model itself. The novelty metric measures the expected information gained about the environment upon observing the new data. As the figure below shows, this is approximated by the disagreement of an ensemble of K latent models. Intuitively, large latent disagreement reflects high model uncertainty, and obtaining the data point would reduce this uncertainty. By maximizing latent disagreement, Plan2Explore selects actions that lead to the largest information gain, therefore improving the model as quickly as possible.</p>



<h3 class="wp-block-heading" id="planning-for-future-novelty">Planning for future novelty</h3>



<p class="wp-block-paragraph">To effectively maximize novelty, we need to know which parts of the environment are still unexplored. Most prior work on self-supervised exploration used model-free methods that reinforce past behavior that resulted in novel experience. This makes these methods slow to explore: since they can only repeat exploration behavior that was successful in the past, they are unlikely to stumble onto something novel. In contrast, Plan2Explore plans for expected novelty by measuring model uncertainty of imagined future outcomes. By seeking trajectories that have the highest uncertainty, Plan2Explore explores exactly the parts of the environments that were previously unknown.</p>



<p class="wp-block-paragraph">To choose actions a that optimize the exploration objective, Plan2Explore leverages the learned world model as shown in the figure below. The actions are selected to maximize the expected novelty of the entire future sequence st:T, using imaginary rollouts of the world model to estimate the novelty. To solve this optimization problem, we use the Dreamer agent, which learns a policy πϕ using a value function and analytic gradients through the model. The policy is learned completely inside the imagination of the world model. During exploration, this imagination training ensures that our exploration policy is always up-to-date with the current world model and collects data that are still novel.</p>



<h3 class="wp-block-heading" id="curiosity-driven-exploration-behavior">Curiosity-driven exploration behavior</h3>



<p class="wp-block-paragraph">We evaluate Plan2Explore on 20 continuous control tasks from the DeepMind Control Suite. The agent only has access to image observations and no proprioceptive information. Instead of random exploration, which fails to take the agent far from the initial position, Plan2Explore leads to diverse movement strategies like jumping, running, and flipping. Later, we will see that these are effective practice episodes that enable the agent to quickly learn to solve various continuous control tasks.</p>



<h3 class="wp-block-heading" id="solving-tasks-with-the-world-model">Solving tasks with the world model</h3>



<p class="wp-block-paragraph">Once an accurate and general world model is learned, we test Plan2Explore on previously unseen tasks. Given a task specified with a reward function, we use the model to optimize a policy for that task. Similar to our exploration procedure, we optimize a new value function and a new policy head for the downstream task. This optimization uses only predictions imagined by the model, enabling Plan2Explore to solve new downstream tasks in a zero-shot manner without any additional interaction with the world.</p>



<p class="wp-block-paragraph">The following plot shows the performance of Plan2Explore on tasks from DM Control Suite. Before 1 million environment steps, the agent doesn’t know the task and simply explores. The agent solves the task as soon as it is provided at 1 million steps, and keeps improving fast in a few-shot regime after that.</p>



<p class="wp-block-paragraph">Plan2Explore (<strong>—</strong>) is able to solve most of the tasks we benchmarked. Since prior work on self-supervised reinforcement learning used model-free agents that are not able to adapt in a zero-shot manner (ICM, <strong>—</strong>), or did not use image observations, we compare by adapting this prior work to our model-based plan2explore setup. Our latent disagreement objective outperforms other previously proposed objectives. More interestingly, the final performance of Plan2Explore is comparable to the state-of-the-art oracle agent that requires task rewards throughout training (<strong>—</strong>). In our paper, we further report performance of Plan2Explore in the zero-shot setting where the agent needs to solve the task before any task-oriented practice.</p>



<h3 class="wp-block-heading" id="future-directions">Future directions</h3>



<p class="wp-block-paragraph">Plan2Explore demonstrates that effective behavior can be learned through self-supervised exploration only. This opens multiple avenues for future research:</p>



<ul class="wp-block-list"><li>First, to apply self-supervised RL to a variety of settings, future work will investigate different ways of specifying the task and deriving behavior from the world model. For example, the task could be specified with a demonstration, description of the desired goal state, or communicated to the agent in natural language.</li><li>Second, while Plan2Explore is completely self-supervised, in many cases a weak supervision signal is available, such as in hard exploration games, human-in-the-loop learning, or real life. In such a semi-supervised setting, it is interesting to investigate how weak supervision can be used to steer exploration towards the relevant parts of the environment.</li><li>Finally, Plan2Explore has the potential to improve the data efficiency of real-world robotic systems, where exploration is costly and time-consuming, and the final task is often unknown in advance.</li></ul>



<p class="wp-block-paragraph">By designing a scalable way of planning to explore in unstructured environments with visual observations, Plan2Explore provides an important step toward self-supervised intelligent machines.</p>
<p>The post <a href="https://www.aiuniverse.xyz/plan2explore-active-model-building-for-self-supervised-visual-reinforcement-learning/">Plan2Explore: Active Model-Building for Self-Supervised Visual Reinforcement Learning</a> appeared first on <a href="https://www.aiuniverse.xyz">Artificial Intelligence</a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://www.aiuniverse.xyz/plan2explore-active-model-building-for-self-supervised-visual-reinforcement-learning/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>Is AI an Existential Threat?</title>
		<link>https://www.aiuniverse.xyz/is-ai-an-existential-threat/</link>
					<comments>https://www.aiuniverse.xyz/is-ai-an-existential-threat/#respond</comments>
		
		<dc:creator><![CDATA[aiuniverse]]></dc:creator>
		<pubDate>Mon, 05 Oct 2020 08:39:43 +0000</pubDate>
				<category><![CDATA[Reinforcement Learning]]></category>
		<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[autonomous]]></category>
		<category><![CDATA[Technology]]></category>
		<guid isPermaLink="false">http://www.aiuniverse.xyz/?p=11924</guid>

					<description><![CDATA[<p>Source: unite.ai When discussing Artificial Intelligence (AI), a common debate is whether AI is an existential threat. The answer requires understanding the technology behind Machine Learning (ML), and recognizing <a class="read-more-link" href="https://www.aiuniverse.xyz/is-ai-an-existential-threat/">Read More</a></p>
<p>The post <a href="https://www.aiuniverse.xyz/is-ai-an-existential-threat/">Is AI an Existential Threat?</a> appeared first on <a href="https://www.aiuniverse.xyz">Artificial Intelligence</a>.</p>
]]></description>
										<content:encoded><![CDATA[
<p class="wp-block-paragraph">Source: unite.ai</p>



<p class="wp-block-paragraph">When discussing Artificial Intelligence (AI), a common debate is whether AI is an existential threat. The answer requires understanding the technology behind Machine Learning (ML), and recognizing that humans have the tendency to anthropomorphize.  We will explore two different types of AI,  Artificial Narrow Intelligence (ANI) which is available now and is cause for concern, and the threat which is most commonly associated with apocalyptic renditions of AI which is Artificial General Intelligence (AGI).</p>



<h3 class="wp-block-heading">Artificial Narrow Intelligence Threats</h3>



<p class="wp-block-paragraph">To understand what ANI is you simply need to understand that every single AI application that is currently available is a form of ANI. These are fields of AI which have a narrow field of specialty, for example autonomous vehicles use AI which is designed with the sole purpose of moving a vehicle from point A to B. Another type of ANI might be a chess program which is optimized to play chess, and even if the chess program continuously improves itself by using&nbsp;<a href="https://www.unite.ai/what-is-reinforcement-learning/">reinforcement learning</a>, the chess program will never be able to operate an autonomous vehicle.</p>



<p class="wp-block-paragraph">With its focus on whatever operation it is responsible for, ANI systems are unable to use generalized learning in order to take over the world. That is the good news; the bad news is that with its reliance on a human operator the AI system is susceptible to biased data, human error, or even worse, a rogue human operator.</p>



<h3 class="wp-block-heading">AI Surveillance</h3>



<p class="wp-block-paragraph">There may be no greater danger to humanity than humans using AI to invade privacy, and in some cases using AI surveillance to completely prevent people from moving freely.  China, Russia, and other nations passed through regulations during COVID-19 to enable them to monitor and control the movement of their respective populations. These are laws which once in place, are difficult to remove, especially in societies that feature autocratic leaders.</p>



<p class="wp-block-paragraph">In China, cameras are stationed outside of people’s homes, and in some cases inside the person’s home. Each time a member of the household leaves, an AI monitors the time of arrival and departure, and if necessary alerts the authorities. As if that was not sufficient, with the assistance of facial recognition technology, China is able to track the movement of each person every time they are identified by a camera. This offers absolute power to the entity controlling the AI, and absolutely zero recourse to its citizens.</p>



<p class="wp-block-paragraph">Why this scenario is dangerous, is that corrupt governments can carefully monitor the movements of journalists, political opponents, or anyone who dares to question the authority of the government. It is easy to understand how journalists and citizens would be cautious to criticize governments when every movement is being monitored.</p>



<p class="wp-block-paragraph">There are fortunately many cities that are fighting to prevent facial recognition from infiltrating their cities. Notably, Portland, Oregon has recently passed a law that blocks facial recognition from being used unnecessarily in the city. While these changes in regulation may have gone unnoticed by the general public, in the future these regulations could be the difference between cities that offer some type of autonomy and freedom, and cities that feel oppressive.</p>



<h3 class="wp-block-heading">Autonomous Weapons and Drones</h3>



<p class="wp-block-paragraph">Over 4500 AI researches have been calling for a ban on autonomous weapons and have created the Ban Lethal Autonomous Weapons website. The group has many notable non-profits as signatories such as Human Rights Watch, Amnesty International, and the The Future of Life Institute which in itself has a stellar scientific advisory board including Elon Musk, Nick Bostrom, and Stuart Russell.</p>



<p class="wp-block-paragraph">Before continuing I will share this quote from The Future of Life Institute which best explains why there is clear cause for concern: “In contrast to semi-autonomous weapons that require human oversight to ensure that each target is validated as ethically and legally legitimate, such fully autonomous weapons select and engage targets without human intervention, representing complete automation of lethal harm. ”</p>



<p class="wp-block-paragraph">Currently, smart bombs are deployed with a target selected by a human, and the bomb then uses AI to plot a course and to land on its target. The problem is what happens when we decide to completely remove the human from the equation?</p>



<p class="wp-block-paragraph">When an AI chooses what humans need targeting, as well as the type of collateral damage which is deemed acceptable we may have crossed a point of no return. This is why so many AI researchers are opposed to researching anything that is remotely related to autonomous weapons.</p>



<p class="wp-block-paragraph">There are multiple problems with simply attempting to block autonomous weapons research. The first problem is even if advanced nations such as Canada, the USA, and most of Europe choose to agree to the ban, it doesn’t mean rogue nations such as China, North Korea, Iran, and Russia will play along. The second and bigger problem is that AI research and applications that are designed for use in one field, may be used in a completely unrelated field.</p>



<p class="wp-block-paragraph">For example, computer vision continuously improves and is important for developing autonomous vehicles, precision medicine, and other important use cases. It is also fundamentally important for regular drones or drones which could be modified to become autonomous.  One potential use case of advanced drone technology is developing drones that can monitor and fight forest fires. This would completely remove firefighters from harms way. In order to do this, you would need to build drones that are able to fly into harms way, to navigate in low or zero visibility, and are able to drop water with impeccable precision. It is not a far stretch to then use this identical technology in an autonomous drone that is designed to selectively target humans.</p>



<p class="wp-block-paragraph">It is a dangerous predicament and at this point in time, no one fully understands the implications of advancing or attempting to block the development of autonomous weapons. It is nonetheless something that we need to keep our eyes on, enhancing whistle blower protection may enable those in the field to report abuses.</p>



<p class="wp-block-paragraph">Rogue operator aside, what happens if AI bias creeps into AI technology that is designed to be an autonomous weapon?</p>



<h3 class="wp-block-heading">AI Bias</h3>



<p class="wp-block-paragraph">One of the most unreported threats of AI is AI bias. This is simple to understand as most of it is unintentional. AI bias slips in when an AI reviews data that is fed to it by humans, using pattern recognition from the data that was fed to the AI, the AI incorrectly reaches conclusions which may have negative repercussions on society. For example, an AI that is fed literature from the past century on how to identify medical personnel may reach the unwanted sexist conclusion that women are always nurses, and men are always doctors.</p>



<p class="wp-block-paragraph">A more dangerous scenario is when AI that is used to sentence convicted criminals is biased towards giving longer prison sentences to minorities. The AI’s criminal risk assessment algorithms are simply studying patterns in the data that has been fed into the system. This data indicates that historically certain minorities are more likely to re-offend, even when this is due to poor datasets which may be influenced by police racial profiling. The biased AI then reinforces negative human policies. This is why AI should be a guideline, never judge and jury.</p>



<p class="wp-block-paragraph">Returning to autonomous weapons, if we have an AI which is biased against certain ethnic groups, it could choose to target certain individuals based on biased data, and it could go so far as ensuring that any type of collateral damage impacts certain demographics less than others. For example, when targeting a terrorist, before attacking it could wait until the terrorist is surrounded by those who follow the Muslim faith instead of Christians.</p>



<p class="wp-block-paragraph">Fortunately, it has been proven that AI that is designed with diverse teams are less prone to bias. This is reason enough for enterprises to attempt when at all possible to hire a diverse well-rounded team.</p>



<h3 class="wp-block-heading">Artificial General Intelligence Threats</h3>



<p class="wp-block-paragraph">It should be stated that while AI is advancing at an exponential pace, we have still not achieved AGI. When we will reach AGI is up for debate, and everyone has a different answer as to a timeline. I personally subscribe to the views of Ray Kurzweil, inventor, futurist, and author of ‘The Singularity is Near” who believes that we will have achieved AGI by 2029.</p>



<p class="wp-block-paragraph">AGI will be the most transformational technology in the world. Within weeks of AI achieving human-level intelligence, it will then reach superintelligence which is defined as intelligence that far surpasses that of a human.</p>



<p class="wp-block-paragraph">With this level of intelligence an AGI could quickly absorb all human knowledge and use pattern recognition to identify biomarkers that cause health issues, and then treat those conditions by using data science. It could create nanobots that enter the bloodstream to target cancer cells or other attack vectors. The list of accomplishments an AGI is capable of is infinite. We’ve previously explored some of the benefits of AGI.</p>



<p class="wp-block-paragraph">The problem is that humans may no longer be able to control the AI. Elon Musk describes it this way: ”With artificial intelligence we are summoning the demon.’ Will we be able to control this demon is the question?</p>



<p class="wp-block-paragraph">Achieving AGI may simply be impossible until an AI leaves a simulation setting to truly interact in our open-ended world. Self-awareness cannot be designed, instead it is believed that an emergent consciousness is likely to evolve when an AI has a robotic body featuring multiple input streams. These inputs may include tactile stimulation, voice recognition with enhanced natural language understanding, and augmented computer vision.</p>



<p class="wp-block-paragraph">The advanced AI may be programmed with altruistic motives and want to save the planet. Unfortunately, the AI may use data science, or even a decision tree to arrive at unwanted faulty logic, such as assessing that it is necessary to sterilize humans,  or eliminate some of the human population in order to control human overpopulation.</p>



<p class="wp-block-paragraph">Careful thought and deliberation needs to be explored when building an AI with intelligence that will far surpasses that of a human. There have been many nightmare scenarios which have been explored.</p>



<p class="wp-block-paragraph">Professor Nick Bostrom in the Paperclip Maximizer argument has argued that a misconfigured AGI if instructed to produce paperclips would simply consume all of earths resources to produce these paperclips. While this seems a little far fetched,  a more pragmatic viewpoint is that an AGI could be controlled by a rogue state or a corporation with poor ethics. This entity could train the AGI to maximize profits, and in this case with poor programming and zero remorse it could choose to bankrupt competitors, destroy supply chains, hack the stock market, liquidate bank accounts, or attack political opponents.</p>



<p class="wp-block-paragraph">This is when we need to remember that humans tend to anthropomorphize. We cannot give the AI human-type emotions, wants, or desires. While there are diabolical humans who kill for pleasure, there is no reason to believe that an AI would be susceptible to this type of behavior. It is inconceivable for humans to even consider how an AI would view the world.</p>



<p class="wp-block-paragraph">Instead what we need to do is teach AI to always be deferential to a human. The AI should always have a human confirm any changes in settings, and there should always be a fail-safe mechanism. Then again, it has been argued that AI will simply replicate itself in the cloud, and by the time we realize it is self-aware it may be too late.</p>



<p class="wp-block-paragraph">This is why it is so important to open source as much AI as possible and to have rational discussions regarding these issues.</p>



<h3 class="wp-block-heading">Summary</h3>



<p class="wp-block-paragraph">There are many challenges to AI, fortunately, we still have many years to collectively figure out the future path that we want AGI to take. We should in the short-term focus on creating a diverse AI workforce, that includes as many women as men, and as many ethnic groups with diverse points of view as possible.</p>



<p class="wp-block-paragraph">We should also create whistleblower protections for researchers that are working on AI, and we should pass laws and regulations which prevent widespread abuse of state or company-wide surveillance. Humans have a once in a lifetime opportunity&nbsp; to improve the human condition with the assistance of AI, we just need to ensure that we carefully create a societal framework that best enables the positives, while mitigating the negatives which include existential threats.</p>
<p>The post <a href="https://www.aiuniverse.xyz/is-ai-an-existential-threat/">Is AI an Existential Threat?</a> appeared first on <a href="https://www.aiuniverse.xyz">Artificial Intelligence</a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://www.aiuniverse.xyz/is-ai-an-existential-threat/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>Engineers Develop New Machine-Learning Method Capable of Cutting Energy Use</title>
		<link>https://www.aiuniverse.xyz/engineers-develop-new-machine-learning-method-capable-of-cutting-energy-use/</link>
					<comments>https://www.aiuniverse.xyz/engineers-develop-new-machine-learning-method-capable-of-cutting-energy-use/#respond</comments>
		
		<dc:creator><![CDATA[aiuniverse]]></dc:creator>
		<pubDate>Mon, 28 Sep 2020 07:32:34 +0000</pubDate>
				<category><![CDATA[Machine Learning]]></category>
		<category><![CDATA[Reinforcement Learning]]></category>
		<category><![CDATA[Develop]]></category>
		<category><![CDATA[ENGINEERS]]></category>
		<category><![CDATA[Machine learning]]></category>
		<guid isPermaLink="false">http://www.aiuniverse.xyz/?p=11805</guid>

					<description><![CDATA[<p>Source:unite.ai Engineers at Swiss Center for Electronics and Microtechnology have developed a new machine-learning method capable of cutting energy use, as well as allowing artificial intelligence (AI) <a class="read-more-link" href="https://www.aiuniverse.xyz/engineers-develop-new-machine-learning-method-capable-of-cutting-energy-use/">Read More</a></p>
<p>The post <a href="https://www.aiuniverse.xyz/engineers-develop-new-machine-learning-method-capable-of-cutting-energy-use/">Engineers Develop New Machine-Learning Method Capable of Cutting Energy Use</a> appeared first on <a href="https://www.aiuniverse.xyz">Artificial Intelligence</a>.</p>
]]></description>
										<content:encoded><![CDATA[
<p class="wp-block-paragraph">Source:unite.ai</p>



<p class="wp-block-paragraph">Engineers at Swiss Center for Electronics and Microtechnology have developed a new machine-learning method capable of cutting energy use, as well as allowing artificial intelligence (AI) to complete tasks that were once considered too sensitive.&nbsp;</p>



<h3 class="wp-block-heading"><strong>Reinforcement Learning Limitations</strong></h3>



<p class="wp-block-paragraph">Reinforcement learning, where a computer continuously improves upon itself by learning from its past experiences, is a major aspect of artificial intelligence. However, this technology is oftentimes difficult to apply to real-life scenarios and situations, such as training climate-control systems. Applications such as this are not able to deal with drastic changes in temperatures, which would be brought on by reinforcement learning.&nbsp;</p>



<p class="wp-block-paragraph">This exact issue is what the CSEM engineers set out to address, and that is when they came up with the new approach. The engineers demonstrated that simplified theoretical models could first be used to train computers, and then they would turn to real-life systems. This allows for the machine learning process to be more accurate by the time it reaches the real-life system, learning from its previous trial-and-errors with the theoretical model. This means that there will be no drastic fluctuations for the real-life system, solving the example issue with climate-control technology. </p>



<p class="wp-block-paragraph">Pierre-Jean Alet is head of smart energy systems research at CSEM, as well as co-author of the study.&nbsp;</p>



<p class="wp-block-paragraph">“It’s like learning the driver’s manual before you start a car,” Alet says. “With this pre-training step, computers build up a knowledge base they can draw on so they aren’t flying blind as they search for the right answer.”</p>



<h3 class="wp-block-heading"><strong>Energy Cuts</strong></h3>



<p class="wp-block-paragraph">One of the most important aspects of this new method is that it can cut energy use by over 20%. The engineers tested the method on a heating, ventilation and air conditioning (HVAC) system, which was located in a 100-room building.&nbsp;</p>



<p class="wp-block-paragraph">The engineers relied on three steps, the first of which was training a computer on a “virtual mode.” This model was constructed through simple equations explaining the behavior of the building. Real building data such as temperature, weather conditions and other variables were then fed to the computer, which resulted in more accurate training. The last step was to allow the computer to run the reinforcement learning algorithms, which would eventually result in the best approach forward for the HVAC system.&nbsp;</p>



<p class="wp-block-paragraph">The new method developed by the CSEM engineers could have big implications for machine learning. Many applications that were once thought to be “untouchable” by reinforcement learning, like those with large fluctuations, could now be approached in a new manner. This would result in lower energy usage, lower financial costs and many other benefits.&nbsp;</p>



<p class="wp-block-paragraph">The research was published in the journal IEEE Transactions on Neural Networks and Learning Systems, titled “A hybrid learning method for system identification and optimal control.” </p>



<p class="wp-block-paragraph">The authors include: Baptiste Schubnel, Rafael E. Carrillo, Pierre-Jean Alet and Andreas Hutter.&nbsp;</p>
<p>The post <a href="https://www.aiuniverse.xyz/engineers-develop-new-machine-learning-method-capable-of-cutting-energy-use/">Engineers Develop New Machine-Learning Method Capable of Cutting Energy Use</a> appeared first on <a href="https://www.aiuniverse.xyz">Artificial Intelligence</a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://www.aiuniverse.xyz/engineers-develop-new-machine-learning-method-capable-of-cutting-energy-use/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>IST researchers exploit vulnerabilities of AI-powered game bots</title>
		<link>https://www.aiuniverse.xyz/ist-researchers-exploit-vulnerabilities-of-ai-powered-game-bots/</link>
					<comments>https://www.aiuniverse.xyz/ist-researchers-exploit-vulnerabilities-of-ai-powered-game-bots/#respond</comments>
		
		<dc:creator><![CDATA[aiuniverse]]></dc:creator>
		<pubDate>Thu, 24 Sep 2020 06:54:11 +0000</pubDate>
				<category><![CDATA[Reinforcement Learning]]></category>
		<category><![CDATA[AI]]></category>
		<category><![CDATA[IST]]></category>
		<category><![CDATA[researchers]]></category>
		<category><![CDATA[Technology]]></category>
		<guid isPermaLink="false">http://www.aiuniverse.xyz/?p=11717</guid>

					<description><![CDATA[<p>Source: news.psu.edu UNIVERSITY PARK, Pa. — If you’ve ever played an online video game, you’ve likely competed with a bot — an AI-driven program that plays on <a class="read-more-link" href="https://www.aiuniverse.xyz/ist-researchers-exploit-vulnerabilities-of-ai-powered-game-bots/">Read More</a></p>
<p>The post <a href="https://www.aiuniverse.xyz/ist-researchers-exploit-vulnerabilities-of-ai-powered-game-bots/">IST researchers exploit vulnerabilities of AI-powered game bots</a> appeared first on <a href="https://www.aiuniverse.xyz">Artificial Intelligence</a>.</p>
]]></description>
										<content:encoded><![CDATA[
<p class="wp-block-paragraph">Source: news.psu.edu</p>



<p class="wp-block-paragraph">UNIVERSITY PARK, Pa. — If you’ve ever played an online video game, you’ve likely competed with a bot — an AI-driven program that plays on behalf of a human.</p>



<p class="wp-block-paragraph">Many of these bots are created using deep reinforcement learning, which is the training of algorithms to learn how to achieve a complex goal through a reward system. But, according to researchers in the College of Information Sciences and Technology at Penn State, using game bots trained by deep reinforcement learning could allow attackers to use deception to easily defeat them.</p>



<p class="wp-block-paragraph">To highlight this risk, the researchers designed an algorithm to train an adversarial bot, which was able to automatically discover and exploit weaknesses of master game bots driven by reinforcement learning algorithms. Their bot was then trained to defeat a world-class AI bot in the award-winning computer game StarCraft II.</p>



<p class="wp-block-paragraph">“This is the first attack that demonstrates its effectiveness in real-world video games,” said Wenbo Guo, a doctoral student studying information sciences and technology. “With the success of deep reinforcement learning in some popular games, like AlphaGo in the game Go and AlphaStar in StarCraft, more and more games are starting to use deep reinforcement learning to train their game bots.”</p>



<p class="wp-block-paragraph">He added, “Our work discloses the security threat of using deep reinforcement learning trained agents as game bots. It will make game developers be more careful about adopting deep reinforcement learning agents.”</p>



<p class="wp-block-paragraph">Guo and his research team presented their algorithm in August at Black Hat USA – a conference that is part of the most technical and relevant information security event series in the world. They also publicly released their code and a variety of adversarial AI bots.</p>



<p class="wp-block-paragraph">“By using our code, researchers and white-hat hackers could train their own adversarial agents to master many — if not all — multi-party video games,” said Xinyu Xing, assistant professor of information sciences and technology at Penn State.</p>



<p class="wp-block-paragraph">Guo concluded, “More importantly, game developers could use it to discover the vulnerabilities of their game bots and take rapid action to patch those vulnerabilities.”</p>



<p class="wp-block-paragraph">In addition to Xing, Guo worked with; Xian Wu, a doctoral student studying informatics at Penn State; and Jimmy Su, senior director of the JD Security Research Center, to develop the algorithm.</p>
<p>The post <a href="https://www.aiuniverse.xyz/ist-researchers-exploit-vulnerabilities-of-ai-powered-game-bots/">IST researchers exploit vulnerabilities of AI-powered game bots</a> appeared first on <a href="https://www.aiuniverse.xyz">Artificial Intelligence</a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://www.aiuniverse.xyz/ist-researchers-exploit-vulnerabilities-of-ai-powered-game-bots/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>Top 5 Technologies that One Should Master in 2020</title>
		<link>https://www.aiuniverse.xyz/top-5-technologies-that-one-should-master-in-2020/</link>
					<comments>https://www.aiuniverse.xyz/top-5-technologies-that-one-should-master-in-2020/#respond</comments>
		
		<dc:creator><![CDATA[aiuniverse]]></dc:creator>
		<pubDate>Sat, 12 Sep 2020 09:52:31 +0000</pubDate>
				<category><![CDATA[Reinforcement Learning]]></category>
		<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[Cloud Computing]]></category>
		<category><![CDATA[Technologies]]></category>
		<guid isPermaLink="false">http://www.aiuniverse.xyz/?p=11539</guid>

					<description><![CDATA[<p>Source: how2shout.com In this world where there is competition everywhere for example in terms of education, sports, jobs, etc. and people always try to show their best <a class="read-more-link" href="https://www.aiuniverse.xyz/top-5-technologies-that-one-should-master-in-2020/">Read More</a></p>
<p>The post <a href="https://www.aiuniverse.xyz/top-5-technologies-that-one-should-master-in-2020/">Top 5 Technologies that One Should Master in 2020</a> appeared first on <a href="https://www.aiuniverse.xyz">Artificial Intelligence</a>.</p>
]]></description>
										<content:encoded><![CDATA[
<p class="wp-block-paragraph">Source: how2shout.com</p>



<p class="wp-block-paragraph">In this world where there is competition everywhere for example in terms of education, sports, jobs, etc. and people always try to show their best to achieve the point of success by defeating the others. It can be thought of as a war that is going around the world with a different name just to suppress the actual meaning i.e. competition. This war will never end until the earth doesn’t get demolished. So, the gist is we need to stay ahead of everybody in the domain we are interested in. To talk in the context of this and also referring to the intended topic of the day we will be collating both the terms together and now let’s move forward. Today, the very first competition that arises amongst the people is how tech-savvy they are that is how much they have the technical knowledge and expertise to solve any problem. Everyone is trying to engineer new things or upgrading the obsolete ones.</p>



<p class="wp-block-paragraph">So, to help aid in these upgrades we have many technologies around. These technologies are somewhere or the other related to computer-related applications and here we will be discussing the top 5 technologies that a person needs to learn to stay ahead in 2020. Let’s begin!</p>



<ol class="wp-block-list"><li><strong>Artificial Intelligence:&nbsp;</strong>The very first and topmost technology that one should know of. The technology rotates about the topic of automating the machinery and creating humanoids that can help aid people to solve any kind of task. The concept was started in the 1960s and since then have evolved so much that people are shifting their career towards this field. The field consists of various sub-components that one can master to achieve the desired success. The components include Machine learning, Deep learning, and subsets to Deep learning. With this technology in hand, one can land in his/her dream job and can also get high pay because of the demand for this technology in every employable sector of the world.</li><li><strong>Internet of Things (IoT):&nbsp;</strong>Yet another technology that is closely linked with AI. The idea behind this technology is connecting various tech devices with a single parent device to monitor everything that is taking place within the device and fix underlying issues if any. Also, the idea is to gather and share different kinds of data so that there is no discrepancy in the same. All gadgets that we use are somewhere or the other linked with IoT starting from the mobile device, washing machines, etc. IoT can even help inn minimizing electricity by developing a cleaner and greener city. With this technology in hand, one can come up with flying colors and get a highly paid job.</li><li><strong>Augmented Reality and Virtual Reality:&nbsp;</strong>These are the most used technologies by modern-day people. Many researchers and scientists prefer to work in Augmented reality to practice their work and then work in the real world. Many AI-based environments especially in reinforcement learning use these technologies to test the humanoid in a virtual environment first and then use it in the real world. Today AR and VR are also been implemented in mobile phones and tablets to make things more real for the people. Learning these technologies may aid you in getting a good job in the market out there.</li><li><strong>Blockchain Technology:&nbsp;</strong>The world’s more trending and safest mode of transaction wherein people use virtual coins to pay their bills. This technology gave rise to the Bitcoin and Etherium we see today. There are many such cryptocurrencies out there regarding which a person can take the knowledge and also learn the behind the scenes of how this Blockchain works. This technology is gaining immense popularity since its creation and many people are investing their money on it. So, I would prefer to learn about this and get a good job in the top tier companies.</li><li><strong>Cognitive Cloud Computing:</strong>&nbsp;The cloud as we see today was not the same in the past. Here I am talking about the cloud-based services that are provided by companies like Microsoft, Google, Amazon, etc. These services since time have evolved so much that now we can do nearly any type of computational work in the cloud-like hosting websites, storing databases, performing AI-related work, deploying the work, etc. These things were at first not in the reach of normal people but, with the help of the modern-day cloud, we can get access to these things and use it for our benefit. Today many companies are hiring cloud-based engineers rather than hard-coding ones because of the fast implementation with it and are paying huge salaries to them. So, I would personally suggest learning this technology to get a decent job out there and stand ahead of others.</li></ol>
<p>The post <a href="https://www.aiuniverse.xyz/top-5-technologies-that-one-should-master-in-2020/">Top 5 Technologies that One Should Master in 2020</a> appeared first on <a href="https://www.aiuniverse.xyz">Artificial Intelligence</a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://www.aiuniverse.xyz/top-5-technologies-that-one-should-master-in-2020/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
	</channel>
</rss>
