AI 비교하기AI 사용하기AI 최신정보AI 커뮤니티
Our VisionTermsPrivacyContact

Google Launches Real-Time 70-Language Voice Translation Model

Google Launches Real-Time 70-Language Voice Translation Model

Ledge AI
Sunday, June 14, 2026
  • •Google introduced Gemini 3.5 Live Translate, a voice-to-voice translation model supporting 70 languages.
  • •The model utilizes streaming processing to track conversations and generate translated audio with a delay of only a few seconds.
  • •Google will deploy the service across mobile apps, Google Meet, and its API to facilitate multilingual communication.
  • •Google introduced Gemini 3.5 Live Translate, a voice-to-voice translation model supporting 70 languages.
  • •The model utilizes streaming processing to track conversations and generate translated audio with a delay of only a few seconds.
  • •Google will deploy the service across mobile apps, Google Meet, and its API to facilitate multilingual communication.

On June 9, 2026 (US time), Google announced Gemini 3.5 Live Translate, a new model capable of near real-time voice-to-voice translation. Supporting over 70 languages, the model uses streaming technology to generate translated audio that retains the original speaker's intonation and speaking pace. Unlike previous translation systems that required waiting for a speaker to finish, this model follows the flow of conversation, outputting translated speech just seconds after the initial utterance.

For general users, the feature will roll out via the Google Translate app on Android and iOS. Android devices gain a new 'listening mode,' allowing users to hear translated audio directly from the earpiece as if on a phone call. This functionality enables private translation in crowded environments or without headphones. Additionally, Google Meet will expand from 5 to 70 supported languages, with a private preview available to select Google Workspace business customers starting this month.

Developers can access the technology through the Gemini Live API and Google AI Studio via the model code 'gemini-3.5-live-translate-preview,' which outputs both translated audio and text. Companies like the ride-hailing service Grab are already testing it to bridge communication between travelers and drivers. All generated audio is embedded with SynthID, a digital watermarking technology used to identify AI-generated content and ensure safety and verifiability.

The company noted remaining technical challenges, including handling sudden voice changes after long pauses, maintaining consistency when multiple people speak simultaneously, and detection accuracy during non-native accents or rapid language switching. Based on the Gemini 3 Pro foundation model, this technology is expected to serve as a language infrastructure not only for personal use but also for professional applications such as meetings, classrooms, and customer support systems.

On June 9, 2026 (US time), Google announced Gemini 3.5 Live Translate, a new model capable of near real-time voice-to-voice translation. Supporting over 70 languages, the model uses streaming technology to generate translated audio that retains the original speaker's intonation and speaking pace. Unlike previous translation systems that required waiting for a speaker to finish, this model follows the flow of conversation, outputting translated speech just seconds after the initial utterance.

For general users, the feature will roll out via the Google Translate app on Android and iOS. Android devices gain a new 'listening mode,' allowing users to hear translated audio directly from the earpiece as if on a phone call. This functionality enables private translation in crowded environments or without headphones. Additionally, Google Meet will expand from 5 to 70 supported languages, with a private preview available to select Google Workspace business customers starting this month.

Developers can access the technology through the Gemini Live API and Google AI Studio via the model code 'gemini-3.5-live-translate-preview,' which outputs both translated audio and text. Companies like the ride-hailing service Grab are already testing it to bridge communication between travelers and drivers. All generated audio is embedded with SynthID, a digital watermarking technology used to identify AI-generated content and ensure safety and verifiability.

The company noted remaining technical challenges, including handling sudden voice changes after long pauses, maintaining consistency when multiple people speak simultaneously, and detection accuracy during non-native accents or rapid language switching. Based on the Gemini 3 Pro foundation model, this technology is expected to serve as a language infrastructure not only for personal use but also for professional applications such as meetings, classrooms, and customer support systems.

Read original (Japanese)·Jun 12, 2026
#google#gemini#speech to speech#translation#real time#synthid