Recreate Any Voice Using One Minute of Sample Audio

A Montreal-based startup developed a set of deep learning algorithms that can copy anyone’s voice with only 60 seconds of sample audio.

Lyrebird, a startup spin-off from the MILA lab at University of Montréal and advised by Aaron Courville and Yoshua Bengio claims to be the first of its kind to allow copying voices in a matter of minutes and control the emotion of the generation.

Using CUDA, TITAN X Pascal GPUs and cuDNN with the Theano deep learning framework, they trained their recurrent neural network on two speakers, one male and one female, each reading ten hours of audio books. Once trained, the algorithm is able to generate 1,000 sentences in less than half a second. Their related paper “SampleRNN: An Unconditional End-to-End Neural Audio Generation Model” provides more details about their model.

The company unveiled an impressive public demo this week consisting of a series of audio samples from Donald Trump, Barack Obama, and Hillary Clinton –  not completely believable… yet, but will improve over time:

The resulting speech can be put to a wide range of uses, says Lyrebird, including “reading of audio books with famous voices, for connected devices of any kind, for speech synthesis for people with disabilities, for animation movies or for video game studios.”

Lyrebird’s developer API is still under development with no timetable on the release, but more than 6,000 people have registered for early access.

Read more >

6 thoughts on “Recreate Any Voice Using One Minute of Sample Audio

  1. RadioactiveLobster on May 1, 2017 at 6:08 am said:

    This surely won’t be abused in any way.

    • Yan Bellavance on August 8, 2017 at 9:24 pm said:

      you raise a huuuge point…

  2. Mayowa Osibodu on May 5, 2017 at 6:09 am said:

    Impressive. It inspires an intriguing perspective– Detangling the voice from the person.

    • Yan Bellavance on August 8, 2017 at 9:23 pm said:

      you decide how you deal with nodes in the web universe using security schemes protocol and such.

      The Ethereum ecosystem (cryptocurrencies, world super computer, contract programming, etc) is concerned with this and some projects are looking to implement a scheme to make sure you know who you are dealing with

  3. Yan Bellavance on August 8, 2017 at 9:34 pm said:

    gonna a ssh key lol

  4. Pocket Rocket on November 28, 2017 at 12:32 am said:

    So that’s how they got Morgan Freeman doing the voice over for literally everything these days.