Microsoft AI launched a family of seven new models on June 2, 2026, under a new lab umbrella called MAI (Microsoft AI). Announced by Mustafa Suleyman at Microsoft Build, these models cover reasoning, coding, image generation, speech-to-text, and text-to-speech.
Here's a complete breakdown of every model, with benchmarks, pricing, and what actually matters.
The MAI Lab: A New Chapter for Microsoft AI
The MAI lab was restructured from what was previously Microsoft's Superintelligence team. The new philosophy is summarized as "Humanist Superintelligence" — advanced AI designed to serve people and organizations, not replace them.
The lab's key differentiator is its "hill-climbing machine" approach: all models are trained from scratch on clean, commercially licensed data, without distillation from third-party models. Their flagship reasoning model, MAI-Thinking-1, explicitly trained without distillation — a departure from most frontier models that inherit capabilities from teacher models.
MAI models are available on Microsoft Foundry, OpenRouter, Fireworks, and Baseten. Three of the models (MAI-Code-1-Flash, MAI-Image-2.5, MAI-Voice-2) are also integrated into Microsoft's 1P products like GitHub Copilot, Visual Studio Code, PowerPoint, and OneDrive.
1. MAI-Thinking-1 — Flagship Reasoning Model
The flagship model of the MAI family. A sparse Mixture of Experts model designed for software engineering and mathematical reasoning.
Specs: 35B active parameters, ~1T total parameters
Key benchmarks:
- SWE-Bench Pro: toe-to-toe with Claude Opus 4.6
- AIME 2025: 97.0%
- AIME 2026: 94.5%
- Human side-by-side: preferred over Claude Sonnet 4.6 in blind evaluations by Surge's professional raters (1,276 tasks, single-turn and multi-turn)
What it's built for: Advanced mathematical reasoning, software engineering tasks, instruction following across multiple layers. Trained on clean, commercially licensed data with AI-generated content excluded from pre-training.
Context window: 256K tokens (enough for a 600-page document)
API: Chat Completions API compatible
Availability: Private preview on Microsoft Foundry; public preview on MAI Playground coming soon
Why it matters: At 35B active parameters, it's a medium-sized model that punches well above its weight. The smaller inference footprint means advanced coding assistance can be deployed more widely — not just for exceptional tasks but daily workflows. Being trained without distillation from third-party models is also notable — most competitors inherit intelligence from other labs' models, but MAI-Thinking-1 learns from the ground up.
2. MAI-Code-1-Flash — Agentic Coding Model
A 5B active parameter coding model built for speed and efficiency. Deeply integrated into GitHub Copilot and VS Code.
What it replaces: Claude Haiku 4.5
Key benchmarks (vs Claude Haiku 4.5):
- SWE-Bench Pro: 51.2% vs 35.2% (+16 points)
- SWE-Bench Verified: higher pass rate with up to 60% fewer tokens
- Terminal Bench 2: outperforms
- IF Bench (instruction following): +28.9 points ahead
- Advanced Instruction Following: +14.5 points ahead
- Adversarial reasoning (custom 186-question benchmark): 85.8% adjusted accuracy
Key features:
- Adaptive thinking — stays concise for simple requests, spends more reasoning budget on complex tasks
- Trained directly with GitHub Copilot harnesses used in production
- Adaptive solution length control for better latency
Availability: Rolling out to GitHub Copilot individual users in VS Code. Available in the model picker and under the default auto picker.
Pricing: Positioned as comparable to Haiku but cheaper — exact pricing not publicly disclosed for Copilot users. Available on OpenRouter, Fireworks, and Baseten for developers.
Why it matters: This is the model that will power GitHub Copilot going forward. The +16-point lead on SWE-Bench Pro (51.2% vs 35.2%) against Haiku 4.5 is significant. The 60% token savings on SWE-Bench Verified means faster, cheaper interactions. If you use VS Code with Copilot, you'll likely see this model automatically.
3. MAI-Image-2.5 — Text-to-Image & Image Editing
Two variants: MAI-Image-2.5 (maximum fidelity) and MAI-Image-2.5-Flash (faster, lower-cost).
Arena rankings:
- Text-to-image: No. 3 on Arena leaderboard
- Image editing: No. 2 on Arena leaderboard (ahead of Nano Banana 2.1 and GPT-Image-1.5)
Capabilities:
- High-quality text rendering (benchmark improvement: +107 points over MAI-Image-2)
- Complex visual reasoning — understands scene structure, lighting, scale, spatial relationships
- Fine-grained localized edits (replace objects, update text, remove motion blur)
- Face and identity consistency across edits
- Cartoon, Anime & Fantasy generation (benchmark improvement: +90 points over MAI-Image-2)
Pricing (Foundry API):
- MAI-Image-2.5 (max fidelity): $5/1M text input tokens, $8/1M image input tokens, $47/1M image output tokens
- MAI-Image-2.5-Flash: $1.75/1M text input, $1.75/1M image input, $19.50/1M image output
Availability: On Foundry today; integrated into PowerPoint (image generation) and OneDrive (image editing)
Also on: OpenRouter, Fireworks, Baseten
Why it matters: The Arena ranking is the key indicator here. No. 2 for image editing means this is competitive with the best available models, period. The price-to-performance is strong — Flash at $19.50/1M outputs is significantly cheaper than many competitors' flagship tiers.
4. MAI Transcribe-1.5 — Speech-to-Text
The world's most accurate multilingual transcription model, according to Microsoft.
Specs: 43 languages (up from 25), SOTA Word Error Rate (WER)
Key benchmarks:
- FLEURS: best-in-class WER across 43 languages (#1)
- Artificial Analysis leaderboard: #3 for WER, #1 for accuracy × speed
Key features:
- Speed: transcribes an hour of audio in under 15 seconds — up to 5x faster than Gemini 3.1, Scribe v2, GPT-4o-Transcribe
- Keyword Biasing: domain-specific terminology improves WER by up to 30%
- Optimized for noisy environments
- 43 languages supported (new: 18 languages added)
Keyword Biasing example: Without biasing, the model misheard names like "Sean" as "Oif" and "Niamh" as "Societal." With keyword biasing, accuracy dramatically improves for specialized vocabulary.
Coming soon: Diarization (multi-speaker identification), native streaming API for real-time transcription
Availability: Integrated into Copilot, Teams, GitHub, and Dynamics 365 Contact Centre. Available on Foundry.
Pricing: Described as "the fastest, most efficient and most cost-effective transcription model of any hyper-scaler" — exact pricing TBD
Why it matters: For enterprises doing meeting transcription, customer service call analytics, or content localization, 5x speed with 43 languages is a serious capability. The Keyword Biasing feature is particularly useful for domain-specific use cases like medical, legal, or technical transcription.
5. MAI-Voice-2 — Text-to-Speech
The most expressive TTS model Microsoft has built. 15 languages, zero-shot voice cloning, granular emotion control.
Specs: 15 languages, zero-shot voice cloning (5-60s reference audio)
Key features:
- Granular emotion control via emotion tags (sad, whispered, excited, etc.)
- Zero-shot voice prompting across all 15 languages
- Speaker identity stability for long-form content (audiobooks, podcasts)
- Code-switching support: Hindi-English and Spanish-English (matches how people actually speak)
- Preferred over MAI-Voice-1 in 72% of side-by-side listening tests
- In a 2,222-response test, 45.5% preferred Voice-2, 44% preferred real human recordings (10.5% tie)
Supported languages: English (US/Australia), Italian, French, German, Hindi, Spanish (Spain/Mexico), Portuguese (Brazil/Portugal), Korean, Chinese (Simplified), Turkish, Russian, Thai, Dutch, Romanian, Hungarian
Availability: On Foundry; integrated into VS Code and Dynamics 365 Contact Centre
Safety: Consent enforced at system level — only authorized, licensed voices can be synthesized in production. No unlicensed voice cloning.
Also on: MAI Playground
Pricing: TBD
Why it matters: The 45.5% vs 44% result (human recordings) in the side-by-side test is striking — synthetic speech is essentially indistinguishable from real human speech for nearly half of listeners. The code-switching capability is unique among major TTS models.
6 & 7. MAI-Voice-2-Flash and the Seventh Model?
Mustafa Suleyman's keynote announced "a family of seven" but the individual model pages detail five distinct models: MAI-Thinking-1, MAI-Code-1-Flash, MAI-Image-2.5 (with Flash variant), MAI Transcribe-1.5, and MAI-Voice-2 (with Flash variant coming soon).
The "seven" likely includes:
- MAI-Thinking-1
- MAI-Code-1-Flash
- MAI-Image-2.5
- MAI-Image-2.5-Flash
- MAI Transcribe-1.5
- MAI-Voice-2
- MAI-Voice-2-Flash (announced as "coming soon")
MAI-Voice-2-Flash is described as "a lower cost, ultra-efficient package" — likely targeting high-volume production use cases where cost per character matters.
Microsoft Frontier Tuning: The Hidden Story
Beyond the individual models, the most strategically significant announcement is Microsoft Frontier Tuning — a reinforcement learning framework that lets organizations fine-tune MAI models on their own workflow data.
The claim is compelling: a tuned MAI model for Excel matched GPT 5.4 while being 10x more efficient. Early adopters saw "similar gains at the frontier" with the highest win rate of any tested model at roughly 10x lower cost.
This is essentially Microsoft's answer to the custom model trend — but instead of training from scratch, you're using reinforcement learning on real workplace traces. The data never leaves your environment.
Availability: Microsoft Foundry
How MAI Models Stack Up Against Competitors
| Model Category | MAI | Top Competitor | MAI Advantage |
|---|---|---|---|
| Reasoning | MAI-Thinking-1 (35B active) | Claude Opus 4.6, Sonnet 4.6, GPT 5.4 | Preferred over Sonnet 4.6 in blind tests; stronger at its weight class |
| Coding | MAI-Code-1-Flash (5B active) | Claude Haiku 4.5 | +16 pts on SWE-Bench Pro; 60% fewer tokens |
| Image Generation | MAI-Image-2.5 | DALL-E 3, Midjourney v7, Stable Diffusion 4 | No. 3 Arena T2I; No. 2 Arena Edit |
| Speech-to-Text | MAI Transcribe-1.5 | GPT-4o Transcribe, Gemini 3.1 | 5x faster; 43 languages vs ~10-20 for competitors |
| Text-to-Speech | MAI-Voice-2 | ElevenLabs, OpenAI TTS | 15 languages; code-switching; emotion control; 45.5% vs 44% human |
Pricing Summary
| Model | Pricing |
|---|---|
| MAI-Thinking-1 | TBD (Foundry private preview) |
| MAI-Code-1-Flash | Comparable to Haiku, cheaper (via Copilot) |
| MAI-Image-2.5 | $5/1M input text, $47/1M output images |
| MAI-Image-2.5-Flash | $1.75/1M input, $19.50/1M output |
| MAI Transcribe-1.5 | TBD (competitive with hyper-scalers) |
| MAI-Voice-2 | TBD |
| MAI-Voice-2-Flash | TBD (lower cost tier) |
What This Means for the AI Landscape
The MAI launch is significant for several reasons:
Microsoft is building a full-stack AI lab. Not just one model, but a family covering reasoning, coding, image, speech-to-text, and text-to-speech — all sharing the same infrastructure and data standards.
No distillation policy. Training from scratch rather than inheriting from other labs' models is a bold stance. It means more steerability and long-term control, but potentially slower initial capability ramp.
Deep product integration. These aren't just API models — they're built directly into VS Code, GitHub Copilot, PowerPoint, OneDrive, Teams, and Dynamics 365. The distribution advantage is enormous.
Enterprise-focused positioning. Frontier Tuning, clean data lineage, consent guardrails, and compliance through Foundry — this is clearly aimed at enterprises that need auditability and control.
Third-party availability. Despite deep 1P integration, MAI models are also on OpenRouter, Fireworks, and Baseten — making them accessible to the broader developer ecosystem.
The "Humanist Superintelligence" framing is aspirational. But the models themselves are real and competitive — particularly MAI-Code-1-Flash for developers using GitHub Copilot, and MAI Transcribe-1.5 for enterprise meeting workflows.
Try It Yourself
- MAI Playground: https://playground.microsoft.ai/chat
- DuoAI (interactive demo with Voice 2 + Transcribe 1.5 + Image 2.5): https://playground.microsoft.ai/?model=duo-ai&duo-mode=conversation
- Foundry API docs: Various model-specific documentation at learn.microsoft.com
Sources:
- Microsoft AI: "Building a hill-climbing machine: Launching seven new MAI models" — https://microsoft.ai/news/building-a-hillclimbing-machine-launching-seven-new-mai-models/
- Microsoft AI: "Introducing MAI-Thinking-1" — https://microsoft.ai/news/introducing-mai-thinking-1/
- Microsoft AI: "Introducing MAI-Code-1-Flash" — https://microsoft.ai/news/introducingmai-code-1-flash/
- Microsoft AI: "MAI-Image-2.5 launches at No. 2 for image editing" — https://microsoft.ai/news/introducing-mai-image-2-5/
- Microsoft AI: "Introducing MAI-Transcribe-1.5" — https://microsoft.ai/news/mai-transcribe-1-5more-accurate-context-aware-and-built-for-production/
- Microsoft AI: "Introducing MAI-Voice-2" — https://microsoft.ai/news/mai-voice-2/
- DEV Community: "Latest AI Model Releases: June 2026 Roundup" — https://dev.to/vjswamy/latest-ai-model-releases-june-2026-roundup-49j5
Stay Ahead of AI
Bookmark AI Tools Insight for honest, data-driven AI reviews and comparisons. No hype, just what works.
Subscribe
Comments & Danmaku
Leave a comment — it flies across the page as danmaku!