Google Releases Japanese‑Supported Speech Synthesis AI “Gemini 3.1 Flash TTS” – We Tested It; Emotion Can Be Controlled with Voice Tags – GIGAZINE
agents deepmind gemini google qwen speech voice
| Source: Mastodon | Original article
Google has added Japanese to its Gemini 3.1 Flash TTS engine, the company announced on Tuesday and GIGAZINE put the model through its own tests. The new voice synthesis service builds on the Flash‑type architecture unveiled earlier this year – a lightweight, low‑latency model designed for real‑time generation on consumer hardware – and now supports the full range of Japanese phonetics, pitch accents and honorific forms.
What sets the release apart is the ability to steer emotional tone with simple “voice tags” embedded in the prompt. By inserting markers such as <happy>, <sad> or <excited>, users can make the output sound more upbeat, somber or urgent without tweaking acoustic parameters manually. In GIGAZINE’s demo, the same sentence spoken with a “<joyful>” tag sounded markedly brighter than the neutral version, while a “<serious>” tag added a measured, authoritative cadence.
Why it matters is twofold. First, Japanese is the world’s third‑largest language market for voice assistants, and native‑level synthesis has been a blind spot for most Western‑origin AI providers. Gemini 3.1 Flash TTS narrows that gap, giving developers a tool that can be embedded in Android apps, Chrome extensions or on‑device services without relying on cloud calls. Second, the emotion‑tagging interface lowers the barrier for content creators, educators and accessibility tools to produce nuanced audio at scale, a capability that previously required separate prosody‑editing pipelines.
The rollout is currently limited to Google Cloud’s Vertex AI API, with a broader consumer‑facing integration expected later this year. As we reported on 15 April, Gemini 3.1’s text‑to‑speech model already offered high‑quality English output; the Japanese extension is the first major multilingual expansion.
What to watch next: the timing of the SDK that will let Android developers call Flash TTS locally, potential bundling with the Gemini 3.1 app for macOS announced on 16 April, and whether Google will expose the voice‑tag syntax in its upcoming Gemini 3.2 update. Competition from open‑source models such as Qwen3‑TTS‑Flash suggests the race for real‑time, emotionally aware speech synthesis is only heating up.
Sources
Back to AIPULSEN