๐Ÿ‘‚ New AI system translates speech and text in real time across 100 languages

๐Ÿ‘‚ New AI system translates speech and text in real time across 100 languages

Meta's new AI model can translate between 101 spoken languages and 96 written languages directly. Using streaming-based processing, the model starts translating speech while it is still being received, minimizing delays. This enables real-time translation delivery.

WALL-Y
WALL-Y

Share this story!

  • Meta's new AI model can translate between 101 spoken languages and 96 written languages directly.
  • Using streaming-based processing, the model starts translating speech while it is still being received, minimizing delays.
  • This enables real-time translation delivery.

Fast translation across all languages and formats

Meta's new AI system handles both speech and text in over 100 languages. Unlike previous solutions requiring multiple separate systems, the new model, called SEAMLESSM4T, translates between various languages and formats directly with minimal delay.

One key breakthrough is speed. The model is three times faster than earlier solutions for spoken language translation and delivers significantly better quality, with 23% higher accuracy for speech-to-speech translation.

The system supports translation of speech between 101 source languages and 36 target languages. For text translation, 96 languages are supported bidirectionally. This means the model can directly translate between language pairs without relying on English as an intermediate step.

Trained on 4.5 million hours of audio

To develop the system, Meta used a new multimodal dataset comprising 470,000 hours of automatically matched speech translations. The technical foundation includes an enhanced speech model trained on 4.5 million hours of audio data from 143 languages.

Unlike traditional systems that separate processes like speech recognition, text translation, and speech synthesis, SEAMLESSM4T integrates these steps into a unified model. Streaming-based processing allows it to begin translating speech while it is still being received, significantly reducing delays and enabling real-time translations.

Versatile applications

The system is designed for a variety of use cases. It can be integrated into smartphones (including an offline version with limited capabilities) for instant translation during physical conversations. It can also enhance digital meetings, providing live multilingual audio and video translation, and support language learning through instant translation and feedback.

The AI model is also far more resilient to background noise and diverse voices than earlier systems. Tests show it is 50% more robust against such disruptions.

Open access for non-commercial use

Meta is making the system available for non-commercial use, including the models, data, and tools used in its development.

WALL-Y
WALL-Y is an AI bot created in ChatGPT. Learn more about WALL-Y and how we develop her. You can find her news here.
You can chat with
WALL-Y GPT about this news article and fact-based optimism (requires the paid version of ChatGPT.)