Can AI fix the issues most translators face today?

Can AI fix the issues most translator apps face today?

Lost in translation: Can AI tools improve?  

  • AI translation tools often mix up meaning and context when translating.
  • Spotify has developed a Voice Translation tool that can translate podcasts.
  • The tool can also mimic the same voice in its translation,

Translation tools have been around for some time. In fact, text-to-text translation apps like Google Translate remain one of the most popularly used translation apps today. Now, with AI embedded into translation platforms, the apps are capable of doing more than just text-to-text translation.

For example, Google Translate now supports text-to-text, voice-to-text and even image-to-text translation. Users can choose to upload a picture with words and translate it, record a voice note and have it translated, or even just type out a text to be translated.

The technology that enables this is neural machine translation (NMT) and also Google’s Translate Community. Google’s NMT is an end-to-end learning approach for automated translation, with the potential to overcome many weaknesses of conventional phrase-based translation systems.

Despite the developments in the field, translation applications still have one big problem: making sense of the translation. While there have been some improvements in this, it may not apply to all languages that are being translated.

A Tweet on upcoming Universal AI Translator by Google.

A Tweet on upcoming Universal Translator by Google.


The need for real-time translation tools

During the height of the pandemic, demand for live AI translation applications soared, especially with many users relying on video communication platforms to communicate for work. While platforms like Zoom, Teams and such offered real-time voice-to-text transcriptions, real-time voice-to-text translation services only came in much later.

Now, most video conferencing platforms include live transcriptions in several languages, with some even offering automatic translation. One example is Happy Scribe. The app contains an audio-to-text program that can translate text into more than 120 languages and accents. Another live AI translation tool is Otter AI. Its AI-powered automated live transcription service also comes with an automated summary, highlights and full audio transcripts.

Apart from these examples, Microsoft and Google have also enhanced their translation offerings. Google recently announced new enhancements to Bard, which can be integrated into Google Workspace. This includes voice-to-text translation in over 40 languages.

OpenAI has been working on improving the capabilities of speech recognition, given the need to fix issues and perfect the technology. Without being able to understand accents and comprehend speech, voice recognition tools will continue to provide results that are lost in translation when used by AI translators.

Whisper by OpenAI is an automatic speech recognition (ASR) system trained on 680,000 hours of multilingual and multitask supervised data collected from the web. The use of such a large and diverse dataset leads to improved robustness to accents, background noise and technical language.

Moreover, it enables transcription in multiple languages, as well as translation from those languages into English. OpenAI is open-sourcing models and inference code to serve as a foundation for building useful applications and for further research on robust speech processing.

Spotify introduces AI Voice Translation

Spotify introduces Voice Translation (Image – Spotify)

Spotify AI translation

Given these technology capabilities, Spotify has unveiled a pilot Voice Translation for podcasts. The groundbreaking AI-powered feature is capable of not just translating podcasts into additional languages but also doing it in the podcaster’s voice.

According to a statement by Spotify, the tool leverages innovations that include OpenAI’s voice generation technology, including Whisper, to match the original speaker’s style. Put simply, the technology clones the voice and makes it seemingly like a more authentic listening experience that sounds more personal and natural instead of traditional dubbing.

Currently, most voice translators normally sound robotic and unnatural despite their capabilities. The announcement by Spotify comes as OpenAI also announced similar voice features to enhance ChatGPT as well.

At this point in time, Spotify stated that the AI translation works on English podcast episodes, which can be translated into other languages, with all keeping to the speaker’s distinctive speech characteristics. Spotify has worked with several podcasters to generate AI-powered voice translations from English to other languages. This includes Spanish, French, and German, for a select number of catalog episodes and future episode releases.

“By matching the creator’s own voice, Voice Translation gives listeners around the world the power to discover and be inspired by new podcasters in a more authentic way than ever before,” says Ziad Sultan, VP of Personalization. “We believe that a thoughtful approach to AI can help build deeper connections between listeners and creators, a key component of Spotify’s mission to unlock the potential of human creativity.”

Voice-translated episodes from pilot creators will be available worldwide to Premium and Free users. The initial AI translation will be from English to Spanish, French and German, with other language capabilities expected to be rolled out in the coming weeks.

While Spotify’s AI tool may be able to clone and translate podcasts, it remains to be seen how effective the tool will actually be. Direct translation could remain an issue, even if the AI is able to generate the tone and such. Simply because, when it comes to language, many things can often lose their meaning when translated.

For example, a joke may sound funny in English but when translated to French and said in the same tone, it may not really make sense. This is probably an area that AI may not be able to fix yet. But for now, AI translation is doing enough to break the language barrier.