Microsoft released three new proprietary AI models, and for businesses building on Azure, this changes the cost and capability equation significantly.
What Was Announced
On April 2, 2026, Microsoft AI announced three new MAI models, now available in Microsoft Foundry:
- MAI-Transcribe-1: State-of-the-art speech-to-text across the top 25 most-used languages, running at 2.5x the speed of Azure’s existing Fast offering
- MAI-Voice-1: Natural voice generation with emotional range, capable of producing 60 seconds of audio in a single second, with support for custom voice cloning from just a few seconds of audio
- MAI-Image-2: Faster, sharper image generation at 2x speed compared to its predecessor, already adopted at scale by WPP for campaign-ready creative production
Why This Matters for Your Business
The headline isn’t just the technology; it’s the positioning. Microsoft is competing directly on price-performance, not just capability.
- MAI-Transcribe-1 starts at $0.36/hour
- MAI-Voice-1 at $22 per 1M characters
- MAI-Image-2 at $5 per 1M tokens (text input) and $33 per 1M tokens (image output)
For enterprises already operating on Azure, this means production-grade AI pipelines, voice, image, and transcription can be deployed with predictable, competitive costs. That’s a meaningful shift from the pilot economics most companies have been working with.
The Bigger Strategic Signal
These aren’t third-party integrations. They are Microsoft-built models, trained under what the company calls a “Humanist AI” philosophy, optimised for how people actually communicate, built for real-world conditions, and red-teamed for safety and compliance.
That matters for enterprise governance. As AI shifts from experimentation to board-level strategy, the ability to deploy models with built-in guardrails and enterprise-grade controls, not bolted on afterwards, reduces risk and accelerates time-to-value.
Gartner projects that 40% of enterprise workflows will be managed by autonomous AI agents by the end of 2026. The infrastructure to support that shift needs to be in place now.
What to Do Next
If you’re building or scaling AI capabilities inside your organisation, these models deserve a hands-on evaluation, especially if transcription, voice UX, or visual content generation are part of your roadmap.
Start with Microsoft Foundry or the MAI Playground (US access). If you want help assessing fit within your Microsoft ecosystem, Stellium’s AI practice can help you move from evaluation to deployment, fast.
The gap between organisations that experiment with AI and those that industrialise it is widening. Models like these make the case for moving faster.