NOW YOU CAN GENERATE MUSIC FROM SCRATCH WITH OPENAI’S NEURAL NET MODEL

Upgrade & Secure Your Future with DevOps, SRE, DevSecOps, MLOps!

We spend hours on Instagram and YouTube and waste money on coffee and fast food, but won’t spend 30 minutes a day learning skills to boost our careers.
Master in DevOps, SRE, DevSecOps & MLOps!

Learn from Guru Rajesh Kumar and double your salary in just one year.

Source: analyticsindiamag.com

One of the popular AI research labs, OpenAI has been working tremendously in the domain of artificial intelligence, particularly on the grounds of neural networks, reinforcement learning, among others. Just a few days back, the AI lab introduced Microscope for AI enthusiasts who are interested in exploring how neural network work.

And now the audio team of OpenAI has introduced a new machine learning model known as Jukebox that generates music while singing in the raw audio domain. This AI model is fed with genre, artist, and lyrics as input to generate new music samples that are produced from scratch.

Over the past few years, generative modelling has made various groundbreaking progress. One of the crucial goals of generative modelling is to capture the important features of the data and create new instances that are indistinguishable from the true data.

In this work, the researchers used the state-of-the-art deep generative models to produce a single system capable of generating diverse high-fidelity music in the raw audio domain with long-range coherence spanning multiple minutes. The researchers stated, “We chose to work on music because we want to continue to push the boundaries of generative models.”

Behind Jukebox

Jukebox is a neural network model that generates music, including rudimentary singing, as raw audio in a variety of genres and artist’s styles. Unlike other music generator models, this neural net model follows a different approach, which is to model music directly as raw audio. Generating music at the audio level is usually challenging due to the very long sequences.

One of the ways of diminishing the issue of long input is to use an autoencoder that will compress raw audio to a lower-dimensional space by discarding some of the perceptually irrelevant bits of information. Jukebox’s autoencoder model compresses audio to a discrete space, using a quantisation-based approach called VQ-VAE.

VQ-VAE is an approach of downsampling extremely long context inputs to a shorter-length discrete latent encoding using vector quantisation. The model uses a hierarchical VQ-VAE architecture for compressing audio into a discrete space, along with a loss function designed to retain the maximum amount of musical information.

According to the researchers, while the previous work has generated raw audio music in the 20–30 second range, this new neural net model is capable of generating pieces that are multiple minutes long, and with recognisable singing in natural-sounding voices.

Dataset Used

To train the Jukebox model, the researchers crawled the web to curate a new dataset of 1.2 million songs, from which 600,000 were in English. Following this, it was paired with the corresponding lyrics and metadata from LyricWiki, where the metadata includes artist, album genre, and year of the songs, along with common moods or playlist keywords associated with each song. The model is further trained on 32-bit, 44.1 kHz raw audio and data augmentation are performed by randomly downmixing the right and left channels to produce mono audio.

Limitations of This Model

The researchers mentioned that there is a significant gap between music generations and human-created music. Some of the limitations are mentioned below:

The generated songs show a variety of features such as local musical coherence, feature impressive solos and traditional chord patterns, but it lacks familiar larger musical structures such as choruses that usually repeat in a song
The downsampling and upsampling process introduces discernable noise. However, improving the VQ-VAE to capture more musical information would help reduce this issue
Because of the autoregressive nature of sampling, the performance of the model is slower. According to the researchers, it takes approximately 9 hours to fully render one minute of audio through our models, and thus they cannot yet be used in interactive applications
Currently, the model is only trained in English and mostly western lyrics, songs in other languages are yet to be trained

Wrapping Up

OpenAI has been working on generating automatic audio samples conditioned on different kinds of priming information for a few years now. With the creation of Jukebox, the researchers hope that it will improve the musicality of samples with unique lyrics, and thus providing a way of giving musicians more control over the generations. They have released the model weights and code, including a tool that will help in exploring the generated samples.

This is not the first time that the San Francisco-based AI research laboratory applied AI to create music. Last year, OpenAI introduced MuseNet, which is a deep neural network that can generate 4-minute musical compositions with 10 different instruments and combine styles from country to Mozart and the Beatles.

DeepMind open-sources Lab2D to support creation of 2D environments for AI and machine learning

Source: computing.co.uk Alphabet subsidiary DeepMind announced on Monday that it has open-sourced Lab2D, a scalable environment simulator for artificial intelligence (AI) research that facilitates researcher-led experimentation with environment Read More

Plan2Explore: Active Model-Building for Self-Supervised Visual Reinforcement Learning

Source: bair.berkeley.edu To operate successfully in unstructured open-world environments, autonomous intelligent agents need to solve many different tasks and learn new tasks quickly. Reinforcement learning has enabled Read More

Is AI an Existential Threat?

Source: unite.ai When discussing Artificial Intelligence (AI), a common debate is whether AI is an existential threat. The answer requires understanding the technology behind Machine Learning (ML), and recognizing Read More

Upgrade & Secure Your Future with DevOps, SRE, DevSecOps, MLOps!

Behind Jukebox

Dataset Used

Limitations of This Model

Wrapping Up

Related Posts

DeepMind open-sources Lab2D to support creation of 2D environments for AI and machine learning

A VR Film/Game with AI Characters Can Be Different Every Time You Watch or Play

Researchers detail LaND, AI that learns from autonomous vehicle disengagements

Google Teases Large Scale Reinforcement Learning Infrastructurean

Plan2Explore: Active Model-Building for Self-Supervised Visual Reinforcement Learning

Is AI an Existential Threat?