Google has launched Gemini 3.1 Flash TTS, a next-generation text-to-speech model that allows developers to control AI-generated speech with precision. Available through the Gemini API, Google AI Studio, Vertex AI, and Google Vids, the model uses "audio tags" to adjust tone, rhythm, and accent, even mid-sentence. It supports over 70 languages and includes SynthID watermarks for content identification. The model ranks first on the TTS leaderboard by Artificial Analysis, with an Elo score of 1,211, and is designed to transform TTS into a programmable voice performance engine.