Microsoft has launched three new AI models built entirely in-house: a speech transcription system, a
google microsoft openai speech voice
| Source: Mastodon | Original article
Microsoft unveiled three new foundational AI models this week, marking the company’s first fully in‑house offering across speech, voice and image generation. The trio—MAI‑Transcribe‑1, MAI‑Voice‑1 and MAI‑Image‑2—debuted on Azure AI Foundry, Microsoft’s self‑service platform for custom models, and are already accessible to enterprise customers via the cloud.
MAI‑Transcribe‑1 claims the lowest word‑error rate of any publicly disclosed system on the 25‑language FLEURS benchmark, positioning it as a direct challenger to OpenAI’s Whisper and Google’s Speech‑2‑Text services. MAI‑Voice‑1 delivers high‑fidelity, low‑latency text‑to‑speech with controllable speaker attributes, while MAI‑Image‑2 upgrades Microsoft’s image synthesis pipeline, offering faster generation and finer detail than the earlier DALL·E‑based Azure service.
The launch signals a strategic pivot for Microsoft, which has relied heavily on OpenAI’s models for its Copilot suite and Azure OpenAI Service. By building a compact stack—each model engineered by teams of fewer than ten engineers—the company reduces licensing costs, gains tighter integration with its own cloud infrastructure, and creates a “platform of platforms” that can be bundled with other Microsoft services such as Teams, Power Platform and Dynamics. The move also cushions Microsoft against potential pricing or policy shifts at OpenAI and Google, and gives it leverage in negotiations with enterprise clients demanding data‑sovereign solutions.
Looking ahead, the key question is how quickly Microsoft can scale these models to match the breadth of OpenAI’s ecosystem. Early adopters will test performance on real‑world workloads, while developers will probe the extensibility of Foundry’s fine‑tuning tools. Watch for announcements on model size expansions, multilingual voice capabilities, and integration of the new stack into upcoming Copilot features. The next few months will reveal whether Microsoft’s home‑grown AI suite can shift the balance of power in the multimodal AI market.
Sources
Back to AIPULSEN