Google's Translatotron makes translating less awkward by mimicking speaker's voice

What you need to know

Translatotron is Google's new speech-to-speech translation model.
By using speech-to-speech translation it skips the step of transcribing the source into text and can speed up translation.
Translatotron is also able to mimic the speaker's voice and cadence making the results sound more human and less robotic.

The ability to communicate with each other all over the world has been paramount to building our society. From exchanging ideas to trading goods and more, language is at the center of everything we do.

The problem is, we don't all speak the same language, and that's where translators have made themselves invaluable over the years. With each new generation, translators are becoming smarter and faster, and Google has just made a new stride in the field with something it calls Translatotron.

Currently, when translating the model that is used requires three steps: hearing the source material and converting it to text, translating the text into the target language, and finally turning that text back into speech.

Using Translatotron Google is able to cut out the transcribing of text and go straight to speech-to-speech translation. One of the benefits of this system is that it can be faster than the system we use now. By skipping the step of converting speech to text it also allows for more accurate translations, avoiding some of the typical errors found during the conversion.

However, the most impressive feature of Translatotron is that it will be able to retain some of the characteristics of the original speaker's voice and cadence.

That is something the old method was never able to achieve and will make the translation sound more human and less robotic. After all, it's not only important what we say, but how we say it.

Google has included some samples in its blog post and even more on its GitHub page. It is definitely worth checking out if you want to see how Translatotron is able to retain aspects of the original speaker's voice. While it is far from perfect, and it still sounds robotic, the results are a big improvement over what we have today.

Google's making phones more accessible to people with speech and hearing impairments