Source – digitaljournal.com
Washington – A new artificial intelligence tool can create realistic videos from audio files alone. This technology, developed at the University of Washington, has been tested on speeches made by former President Obama.
The technology is based on newly prepared algorithms, which are designed to overcome a limitation with ‘computer vision’. This is with turning audio clips into realistic, lip-synced videos of the person who is speaking the words. The developed algorithms learn from videos that exist “in the wild”, such as on the Internet or elsewhere.
To do so involved training a neural network (a collection of connected units called artificial neurons) to view videos of an individual and then to translate different audio sounds into basic mouth shapes. The second area was using a new mouth synthesis technique to realistically superimpose mouth shapes and textures onto an existing reference video of a given person.
To test out the technology, the research group generated a realistic video of Barack Obama discussing such diverse subjects as terrorism, fatherhood and employment. The video was created using audio clips alone together with a separate video image of the former president. The video overcomes a major problem with adding audio to video, where the mouth of the speaker appears unrealistic.
Discussing the outcome, lead researcher Professor Ira Kemelmacher- Shlizerman enthused: “These type of results have never been shown before.” To this required an artificial intelligence algorithm, one capable of learning and anticipating the intricate patterns of human speech. The reason Obama was chosen for the project was due to the sheer volume of available recordings.
The technology will be presented to the August meeting of SIGGRAPH 2017. A white paper has been produced titled “Synthesizing Obama: Learning Lip Sync from Audio”, to discuss the technology.
What does technology this offer businesses?
The advantages to businesses are considerable, allowing high quality audit recordings to be made and later turned into videos of a higher resolution that would be possible using a standard camera and with taking archival sound recordings, which is an area that may appeal to the entertainments industry. Imagine, for example, being able to hold a conversation with a historical figure in virtual reality by creating visuals just from audio.
What could this mean for you?
For consumers, video chat tools like Skype, Google Hangouts or Messenger will enable any person to collect videos that could be used to train computer models. A further appeal to businesses is since streaming audio over the Internet requires much less bandwidth than video, the new software will put an end to video chats that ‘time out’ as a result of poor connections. This is by reversing the process , that is feeding video into the network instead of just audio. Often with ‘video chats’ the audio is good but the video is poor, which is something that frustrates many business professionals and hampers attempts by businesses to reduce the number of meetings by ‘going digital’.