In a significant strategic shift, Microsoft AI has launched three proprietary foundational AI models, marking its most concrete step towards building an independent AI stack and directly challenging rivals like Google and OpenAI. The models—MAI-Transcribe-1, MAI-Voice-1, and MAI-Image-2—are available immediately through Microsoft Foundry and a new MAI Playground platform, covering speech-to-text, text-to-speech, and image generation.
The launch follows a crucial contract renegotiation with OpenAI in late 2025. Under the original 2019 agreement, Microsoft was contractually blocked from independently pursuing artificial general intelligence (AGI). The revised terms, finalized as OpenAI sought to expand its compute footprint beyond Microsoft, now free Microsoft to build its own frontier models while retaining license rights to OpenAI's developments through 2032. Microsoft AI CEO Mustafa Suleiman confirmed the partnership with OpenAI remains intact, but stated the renegotiation "enabled us to independently pursue our own superintelligence."
MAI-Transcribe-1 is positioned as the headliner, claiming best-in-class accuracy. Microsoft states it achieves the lowest average Word Error Rate (3.8%) on the FLEURS benchmark across the top 25 languages by Microsoft product usage, outperforming OpenAI's Whisper-large-v3 on all 25 languages and Google's Gemini 3.1 Flash on 22 of them. It processes audio files up to 200MB and is reportedly 2.5 times faster than Azure's existing offering, with initial testing underway inside Teams and Copilot Voice.
MAI-Voice-1 generates 60 seconds of natural-sounding audio in one second and supports custom voice creation from minimal sample audio, priced at $22 per million characters. MAI-Image-2 ranks in the top three on the Arena.ai leaderboard and is priced at $5 per million input tokens and $33 per million image output tokens, with early enterprise adoption by WPP.
A surprising technical detail revealed that each model was built by teams of fewer than 10 engineers, a stark contrast to industry trends of massive resource allocation. Suleiman attributed performance gains to model architecture and data choices rather than headcount.
Microsoft is employing an aggressive pricing strategy, explicitly designed to undercut Amazon and Google. Suleiman described it as "the cheapest of any of the hyperscalers." The company confirmed plans to build a frontier large language model, targeting full AI independence and state-of-the-art models across all modalities, supported by planned frontier-scale GPU clusters over the next 12-18 months.