Google's upgraded Gemini 2.5 Flash Native Audio model makes AI more conversational

The Gemini Audio graphic for Gemini 2.5 Flash Native Audio.
(Image credit: Google)

What you need to know

  • Gemini Live and Search Live are now using Google's new-and-improved Gemini 2.5 Flash Native Audio model.
  • The upgraded model is more conversational, can interact with external sources without impeding the chat's flow, and handles complex requests better.
  • It surpasses the previous 9-25 revision while also topping OpenAI's gpt-realtime model in benchmarks.

Disclaimer

Enjoy our content? Make sure to set Android Central as a preferred source in Google Search, and find out why you should so that you can stay up-to-date on the latest news, reviews, features, and more.

Gemini's voice agents are getting a major upgrade this week, as Google is updating the Gemini 2.5 Flash Native Audio model to improve its conversational sound, understanding of user instructions, and ability to fit into complex workflows. The latest Gemini 2.5 Flash Native Audio is rolling out now for developers in Google AI Studio and Vertex AI, and for Gemini Live and Search Live users.

The changes will make it easier to converse with Gemini while chatting live, and can improve the quality of Google's Live Voice Agents. Specifically, new Gemini 2.5 Flash Native Audio 12-25 model improves multi-turn conversation quality. When you chat with Gemini Live across multiple turns, it'll remember context from old turns. The extra context helps create "more cohesive conversations," according to Google.

The model is also better at interacting with external workflows without impacting the smoothness of your conversation. It can pick up on your audial cues to figure out when to access these outside functions. These external workflows can provide real-time information that Gemini 2.5 Flash Native Audio can subsequently insert into its audio responses.

Gemini 2.5 Flash Native Audio: Powering conversational experiences - YouTube Gemini 2.5 Flash Native Audio: Powering conversational experiences - YouTube
Watch On

Gemini's Live Voice Agent is also better at understanding and acting upon complex instructions from a user. Google says these upgrades result in "higher user satisfaction on content completeness." In other words, when interacting with a Live Voice Agent powered by Gemini 2.5 Flash Native Audio 12-25, you may not need to demand to speak to a human representative. The artificial intelligence model might be able to handle more multi-step tasks on its own.

It's more reliable overall, with a 90% adherence rate to developer instructions. That's an increase of six percent compared to the older Gemini 2.5 Flash Native Audio 9-25 model.

The improvements made to Gemini 2.5 Flash native audio.

(Image credit: Google)

In the ComplexFuncBench Audio benchmark, the latest Gemini 2.5 Flash Native Audio model beats both its predecessor and OpenAI's gpt-realtime model with a score of 71.5%.

The upgraded Gemini 2.5 Flash Native Audio, as well as Live Voice Agents, are available now in Google AI Studio and Vertex AI. It's also debuting in preview in the Gemini API. Android users can find the model in action in Gemini Live and Search Live, too.

TOPICS
Brady Snyder
Contributor

Brady is a tech journalist for Android Central, with a focus on news, phones, tablets, audio, wearables, and software. He has spent the last three years reporting and commenting on all things related to consumer technology for various publications. Brady graduated from St. John's University with a bachelor's degree in journalism. His work has been published in XDA, Android Police, Tech Advisor, iMore, Screen Rant, and Android Headlines. When he isn't experimenting with the latest tech, you can find Brady running or watching Big East basketball.

You must confirm your public display name before commenting

Please logout and then login again, you will then be prompted to enter your display name.