Google Search Live Getting Gemini Audio Upgrade
Gemini 2.5 Flash Native Audio delivers more natural voices and smoother real-time Search Live experiences.
Google Search Live Getting Gemini Audio Upgrade
Google announced the latest version of Gemini 2.5 Flash Native Audio. Additionally, Google Translate for live headphones translation, AI Mode’s Search Live benefit from these model upgrades. Like Gemini Live, Search Live is responding now to be “more fluid and expressive than ever before”, which includes voices that sound more natural and the ability to slow down the response just by asking. Gemini 2.5 Flash Native Audio is rolling out over the next week to all Search Live (Android + iOS) users in the US.
Google updates Search Live with Gemini 2.5 Flash Native Audio
Gemini 2.5 Flash with Gemini Live API native audio featuring cutting-edge native audio functionality for Gemini Live API. Further additionally, the model includes:
Enhanced audio quality that experience dramatically improved audio quality, feeling like speaking with a person. Enhanced voice quality and adaptability (Gemini Live API native audio) offering rich, more natural voice interactions with 30 HD voices in 24 languages. Furthermore, enabling proactive audio, generating text transcripts and audio responses proactively only for queries directed to the device, and does not respond to non-device-directed queries. Gemini Live API native audio models understand and respond appropriately to users’ emotional expressions for more nuanced conversations. Improve barge-in interrupting Gemini more naturally and reliably, even in loud and noisy environments. Robust function calling executed by Gemini, supporting use cases and improving triggering rate. Other models include enhanced accuracy transcription and seamless multilingual support.
Broader Gemini Native Audio Rollout
Search upgrade is part of a broader update to Gemini 2.5 Flash Native Audio rolling out across Google’s ecosystem, like Gemini Live (in the Gemini App), Google AI Studio, and Vertex AI. Processing a spoken audio in real time and producing fluid spoken responses reduces the barriers to natural conversation, reducing friction in live interactions. Furthermore, going live with Search enables back-and-forth voice conversation in AI mode to get real-time help and find relevant sites across the web.
Google’s announcement didn’t say whether the model was a speech-to-speech model (as opposed to speech-to-text then text-to-speech), following Google’s October announcement of “Speech-to-Retrieval”. It is a neural network-based machine-learning model trained on large datasets of paired audio queries. The change shows Google's treatment of native audio as a core capability across consumer-facing products, enabling users to ask and receive information about the physical world around them in a natural manner that wasn’t previously possible.
Improvements for Voice-Based Systems
Building voice-based systems for developers and enterprises, Google says that the updated model improves reliability in several areas.
Google Search Live Gets a Gemini Audio Upgrade for Smoother Replies
Search Live is getting an upgrade with Gemini 2.5 native audio, delivering faster, more natural voice conversations and hands-free help in the Google app. Google is rolling out a new Gemini-powered audio upgrade to Search Live, turning voice queries into faster, more fluid, back-and-forth conversations. In an official announcement, the tech titan says the Gemini-powered feature is becoming available in the US on Android and iOS.
Rather than generating text first and converting it to speech, the model produces responses directly in audio, so replies are better paced and more consistent throughout a conversation. New feature running on Gemini 2.5 Flash Native Audio, Google’s update model is built specifically for live voice interactions.
Smooth Conversational Translation
Real-time communication is an integral part of both professional and personal lives. Speaking to people remotely across language barriers makes it difficult to truly connect by just relying on state-of-the-art translated captions, lacking personality and real-time responsiveness essential for fluid conversation. Introducing an innovative end-to-end speech-to-speech translation model enabling real-time translation in the original speaker’s voice with only a 2-second delay, bringing long-imagined technology into reality and making cross-language communication more natural.
Beyond search and voice agents, the introduction of native support for “live speech-to-speech translation.” Gemini translates spoken language in real-time, either by continuously translating ambient speech into a target language or by handling conversations between speakers of different languages in both directions. The system preserves vocal characteristics like speech rhythm and focus, supporting translation that sounds smoother and conversational.
Google is highlighting several capabilities that support the translation feature, which reduces setup friction and enables translation to occur passively during conversation rather than through manual controls. As a result, translation experience behaves like an actual person in the middle of translation between two people.
Voice Search Realizing Google’s Aspirations
Google is continuously iterating towards ideal voice search, which was originally inspired by science fiction voice interactions between humans and computers in the popular StarTrek television and movie series.