meta-learning Archives - Artificial Intelligence

What Is Meta-Learning via Learned Losses (with Python Code)

aiuniverse — Mon, 01 Mar 2021 07:07:28 +0000

Source – https://analyticsindiamag.com/

Facebook AI Research (FAIR) research on meta-learning has majorly classified into two types: First, methods that can learn representation for generalization. Second, methods that can optimize models. We have thoroughly discussed the type first in our previous article MBIRL. For this post, we are going to give a brief introduction to the second type. Last month, at the International Conference on Pattern Recognition, {ICPR}, Italy, January 10-15, 2021, a group of researchers: S. Bechtle, A. Molchanov, Y. Chebotar, E. Grefenstette, L. Righetti, G. S. Sukhatme, F. Meier submitted a research paper focussing on the automation of “meta-training” processing: Meta Learning via Learned Loss.

Motivation Behind ML³

In meta-learning, the goal is to efficiently optimize the function f_θ which can be a regressor or classifier that finds the optimal value of θ. L is the loss function and h is the gradient transform. The majority of the work in deep learning is associated with learning the f function directly from data and some meta-learning work focuses on the parameter updation. In ML³ approach, the authors have targeted loss learning. Loss functions are architecture independent and widely used for learning problems so learning a loss function doesn’t require any engineering and optimization and allows the addition of extra information during meta-training.

The key idea of the proposed framework is to develop a pipeline for meta-training that not only can optimize the performance of the model but also generalize for different tasks and model architectures. The proposed framework of learning loss functions efficiently optimize the models for new tasks. The main contribution of the ML³ framework are :

i) It is capable of learning adaptive, high-dimensional functions via back propagation and gradient descent.

ii) The given framework is very flexible as it is capable of storing additional information at the meta-train time and provides generalization by solving regression, classification, model-based reinforcement learning, model-free reinforcement learning.

The Model Architecture of ML³

The task of learning a loss function is based on a bi-level optimization technique i.e., it contains two optimization loops: inner and outer. The inner loop is responsible for training the model or optimizee with gradient descent by using the loss function learners meta-loss function and the outer loop optimized the meta-loss function by minimizing the task loss i.e., regression or classification or reinforcement learning loss.

The process contains a function f parameterized by θ that takes a variable x and outputs y. It also learns meta-loss network M parameterized by Φ that takes the input and output of function f and together with task-specific information g (for example ground truth label for regression or classification, final position in MBIRL or the sample reward from model-free reinforcement learning problems) and outputs the meta- loss function L parameterized by both Φ and θ.

So, to update function f, compute the gradient of Meta Loss L with respect to θ and update the gradient using the learned loss function, as shown below :

Now, to update M, the loss network, formulate a task-specific loss that compares the output of the currently optimal f with the target information since f is updated with L, the task is also functional Φ and perform gradient update on Φ to optimize M. This architecture finally forms a fully differential loss learning framework used for training.

To use the learning loss at Test time, directly update f by taking the gradient of learned loss L with respect to the parameters of f.

Applications of ML³

Regression problems.
Classification problems.
Shaping Loss during training e.g., Covexifying Loss, exploration signal. ML³ provides a possibility to add additional information during meta-training.
Model-based Reinforcement Learning.
Model-free Reinforcement Learning.

Requirements & Installation

Python=3.7
Clone the Github repository via git.
Install all the dependencies of ML³ via :

Paper Experiment Demos

This section contains different experiments mentioned in the research paper.

A. Loss Learning for Regression

Run Sin function regression experiment by code below:

Now, you can visualize the results by the following code:

2.1 Import the required libraries, packages and modules and specify the path to the saved data during meta-training. The code snippet is available here.

2.2 Load the saved data during the experiment.

2.3 Visualize the performance of the meta loss when used to optimize the meta training tasks, as a function of (outer) meta training iterations.

2.4 Evaluating learned meta loss networks on test tasks. Plot the performance of the final meta loss network when used to optimize the new test tasks at meta test time. Here the x-axis represents the number of gradient descent steps. The code snippet is available here.

C. Learning with extra information at the meta-train time

This demo shows how we can add extra information during meta training in order to shape the loss function. For experiment purposes, we have taken the example of sin function. Now, with the code, the script requires two arguments, first one is train\test, the 2nd one indicates whether to use extra information by setting True\False (with\without extra info).

For training, the code is given below
To test the loss with extra information run:

For comparison purposes, we have repeated the above two steps with argument as False. The full code is available here.
Comparison of results via visualization.

Similarly, the research experiment for meta learning the loss with an additional goal in the mountain car experiment run can be done. The code lines is available here.

EndNotes

In this write-up we have given an overview Meta Learning via Learned Loss(ML³), a gradient-based bi-level optimization algorithm which is capable of learning any parametric loss function as long as the output is differential with respect to its parameters. These learned loss functions can be used to efficiently optimize models for new tasks.

Note : All the figures/images except the output of the code are taken from official sources of ML³.

Colab Notebook ML³ Demo

Official Code, Documentation & Tutorial are available at:

Github
Website
Research Paper

The post What Is Meta-Learning via Learned Losses (with Python Code) appeared first on Artificial Intelligence.

Falling Walls: The Past, Present and Future of Artificial Intelligence

aiuniverse — Fri, 03 Nov 2017 05:41:42 +0000

Source – scientificamerican.com

As a boy, I wanted to maximize my impact on the world, so I decided I would build a self-improving AI that could learn to become much smarter than I am. That would allow me to retire and let AIs solve all of the problems that I could not solve myself—and also colonize the universe in a way infeasible for humans, expanding the realm of intelligence.

So I studied mathematics and computers. My very ambitious 1987 diploma thesis described the first concrete research on meta-learning programs, which not only learn to solve a few problems but also learn to improve their own learning algorithms, restricted only by the limits of computability, to achieve super-intelligence through recursive self-improvement.

I am still working on this, but now many more people are interested. Why? Because the methods we’ve created on the way to this goal are now permeating the modern world—available to half of humankind, used billions of times per day.

As of August 2017, the five most valuable public companies in existence are Apple, Google, Microsoft, Facebook and Amazon. All of them are heavily using the deep-learning neural networks developed in my labs in Germany and Switzerland since the early 1990s—in particular, the Long Short-Term Memory network, or LSTM, described in several papers with my colleagues Sepp Hochreiter, Felix Gers, Alex Graves and other brilliant students and postdocs funded by European taxpayers.In the beginning, such an LSTM is stupid. It knows nothing. But it can learn through experience. It is a bit inspired by the human cortex, each of whose more than 15 billion neurons are connected to 10,000 other neurons on average. Input neurons feed the rest with data (sound, vision, pain). Output neurons trigger muscles. Thinking neurons are hidden in between. All learn by changing the connection strengths defining how strongly neurons influence each other.

Things are similar for our LSTM, an artificial recurrent neural network (RNN), which outperforms previous methods in numerous applications. LSTM learns to control robots, analyze images, summarize documents, recognize videos and handwriting, run chat bots, predict diseases and click rates and stock markets, compose music, and much more. LSTM has become a basis of much of what’s now called deep learning, especially for sequential data (note that most real-world data is sequential).

In 2015, LSTM greatly improved Google’s speech recognition, now on over two billion Android phones. LSTM is also at the core of the new, much better Google Translate service used since 2016. LSTM is also in Apple’s QuickType and Siri on almost 1 billion iPhones. LSTM also creates the spoken answers of Amazon’s Alexa.

As of 2016, almost 30 percent of the awesome computational power for inference in all those Google data centers was used for LSTM. As of 2017, Facebook is using LSTM for a whopping 4.5 billion translations each day—more than 50,000 per second. You are probably using LSTM all the time. But other deep learning algorithms of ours are also now available to billions of users.

We called our RNN-based approaches “general deep learning,” to contrast them with traditional deep learning in the multilayer feed-forward neural networks (FNNs) pioneered by Ivakhnenko & Lapa (1965) more than half a century ago in the Ukraine (which back then was part of the USSR). Unlike FNNs, RNNs such as LSTM have general purpose, parallel-sequential computational architectures. RNNs are to the more limited FNNs as general-purpose computers are to mere calculators.

By the early 1990s, our (initially unsupervised) deep RNNs could learn to solve many previously unlearnable tasks. But this was just the beginning. Every five years computers are getting roughly 10 times faster per dollar. This trend is older than Moore’s Law; it has held since Konrad Zuse built the first working program-controlled computer over the period 1935–1941, which could perform roughly one elementary operation per second. Today, 75 years later, computing is about a million billion times cheaper. LSTM has greatly profited from this acceleration.

Today’s largest LSTMs have a billion connections or so. Extrapolating this trend, in 25 years we should have rather cheap, human-cortex-sized LSTMs with more than 100,000 billion electronic connections, which are much faster than biological connections. A few decades later, we may have cheap computers with the raw computational power of all of the planet’s 10 billion human brains together, which collectively probably cannot execute more than 10³⁰meaningful elementary operations per second. And Bremermann’s physical limit (1982) for 1 kilogram of computational substrate is still over 10²⁰ times bigger than that. The trend above won’t approach this limit before the next century, which is still “soon” though—a century is just 1 percent of the 10,000 years human civilization has existed.

LSTM by itself, however, is a supervised method and therefore not sufficient for a true AI that learns without a teacher to solve all kinds of problems in initially unknown environments. That’s why for three decades I have been publishing on more general AIs.

A particular focus of mine since 1990 has been on unsupervised AIs that exhibit what I have called “artificial curiosity” and creativity. They invent their own goals and experiments to figure out how the world works, and what can be done in it. Such AIs may use LSTM as a submodule that learns to predict consequences of actions. They do not slavishly imitate human teachers, but derive rewards from continually creating and solving their own, new, previously unsolvable problems, a bit like playing kids, to become more and more general problem solvers in the process (buzzword: PowerPlay, 2011). We have already built simple “artificial scientists” based on this.

Extrapolating from this work, I think that within not so many years we’ll have an AI that incrementally learns to become as smart as a little animal—curiously and creatively and continually learning to plan and reason and decompose a wide variety of problems into quickly solvable (or already solved) sub-problems. Soon after we develop monkey-level AI we may have human-level AI, with truly limitless applications.

And it won’t stop there. Many curious AIs that invent their own goals will quickly improve themselves, restricted only by the fundamental limits of computability and physics. What will they do? Space is hostile to humans but friendly to appropriately designed robots, and offers many more resources than our thin film of biosphere, which receives less than a billionth of the sun’s light. While some AIs will remain fascinated with life, at least as long as they don’t fully understand it, most will be more interested in the incredible new opportunities for robots and software life out there in space. Through innumerable self-replicating robot factories in the asteroid belt and beyond they will transform the solar system and then within a few hundred thousand years the entire galaxy and within billions of years the rest of the reachable universe, held back only by the light-speed limit. (AIs or parts thereof are likely to travel by radio from transmitters to receivers—although putting these in place will take considerable time.)

This will be very different from the scenarios described in the science fiction novels of the 20th century, which also featured galactic empires and smart AIs. Most of the novels’ plots were very human-centric and thus unrealistic. For example, to make large distances in the galaxy compatible with short human life spans, sci-fi authors invented physically impossible technologies such as warp drives. The expanding AI sphere, however, won’t have any problems with physics’ speed limit. Since the universe will continue to exist for many times its current 13.8-billion year age, there will probably be enough time to reach all of it.

Many sci-fi novels featured single AIs dominating everything. It is more realistic to expect an incredibly diverse variety of AIs trying to optimize all kinds of partially conflicting (and quickly evolving) utility functions, many of them generated automatically (my lab had already evolved utility functions in the millennium that just ended), where each AI is continually trying to survive and adapt to rapidly changing niches in AI ecologies driven by intense competition and collaboration that lie beyond our current imagination.

Some humans may hope to become immortal parts of these ecologies through brain scans and “mind uploads” into virtual realities or robots, a physically plausible idea discussed in fiction since the 1960s. However, to compete in rapidly evolving AI ecologies, uploaded human minds will eventually have to change beyond recognition, becoming something very different in the process.

So humans won’t play a significant role in the spreading of intelligence across the cosmos. But that’s OK. Don’t think of humans as the crown of creation. Instead view human civilization as part of a much grander scheme, an important step (but not the last one) on the path of the universe towards higher complexity. Now it seems ready to take its next step, a step comparable to the invention of life itself over 3.5 billion years ago.

This is more than just another industrial revolution. This is something new that transcends humankind and even biology. It is a privilege to witness its beginnings, and contribute something to it.

The post Falling Walls: The Past, Present and Future of Artificial Intelligence appeared first on Artificial Intelligence.