Microsoft MAI Models: Seven New Models, Full Breakdown (June 2026)

Microsoft AI launched a family of seven new models on June 2, 2026, under a new lab umbrella called MAI (Microsoft AI). Announced by Mustafa Suleyman at Microsoft Build, these models cover reasoning, coding, image generation, speech-to-text, and text-to-speech.

Here's a complete breakdown of every model, with benchmarks, pricing, and what actually matters.

The MAI Lab: A New Chapter for Microsoft AI

The MAI lab was restructured from what was previously Microsoft's Superintelligence team. The new philosophy is summarized as "Humanist Superintelligence" — advanced AI designed to serve people and organizations, not replace them.

The lab's key differentiator is its "hill-climbing machine" approach: all models are trained from scratch on clean, commercially licensed data, without distillation from third-party models. Their flagship reasoning model, MAI-Thinking-1, explicitly trained without distillation — a departure from most frontier models that inherit capabilities from teacher models.

MAI models are available on Microsoft Foundry, OpenRouter, Fireworks, and Baseten. Three of the models (MAI-Code-1-Flash, MAI-Image-2.5, MAI-Voice-2) are also integrated into Microsoft's 1P products like GitHub Copilot, Visual Studio Code, PowerPoint, and OneDrive.

1. MAI-Thinking-1 — Flagship Reasoning Model

The flagship model of the MAI family. A sparse Mixture of Experts model designed for software engineering and mathematical reasoning.

Specs: 35B active parameters, ~1T total parameters

Key benchmarks:

SWE-Bench Pro: toe-to-toe with Claude Opus 4.6
AIME 2025: 97.0%
AIME 2026: 94.5%
Human side-by-side: preferred over Claude Sonnet 4.6 in blind evaluations by Surge's professional raters (1,276 tasks, single-turn and multi-turn)

What it's built for: Advanced mathematical reasoning, software engineering tasks, instruction following across multiple layers. Trained on clean, commercially licensed data with AI-generated content excluded from pre-training.

Context window: 256K tokens (enough for a 600-page document)

API: Chat Completions API compatible

Availability: Private preview on Microsoft Foundry; public preview on MAI Playground coming soon

Why it matters: At 35B active parameters, it's a medium-sized model that punches well above its weight. The smaller inference footprint means advanced coding assistance can be deployed more widely — not just for exceptional tasks but daily workflows. Being trained without distillation from third-party models is also notable — most competitors inherit intelligence from other labs' models, but MAI-Thinking-1 learns from the ground up.

2. MAI-Code-1-Flash — Agentic Coding Model

A 5B active parameter coding model built for speed and efficiency. Deeply integrated into GitHub Copilot and VS Code.

What it replaces: Claude Haiku 4.5

Key benchmarks (vs Claude Haiku 4.5):

SWE-Bench Pro: 51.2% vs 35.2% (+16 points)
SWE-Bench Verified: higher pass rate with up to 60% fewer tokens
Terminal Bench 2: outperforms
IF Bench (instruction following): +28.9 points ahead
Advanced Instruction Following: +14.5 points ahead
Adversarial reasoning (custom 186-question benchmark): 85.8% adjusted accuracy

Key features:

Adaptive thinking — stays concise for simple requests, spends more reasoning budget on complex tasks
Trained directly with GitHub Copilot harnesses used in production
Adaptive solution length control for better latency

Availability: Rolling out to GitHub Copilot individual users in VS Code. Available in the model picker and under the default auto picker.

Pricing: Positioned as comparable to Haiku but cheaper — exact pricing not publicly disclosed for Copilot users. Available on OpenRouter, Fireworks, and Baseten for developers.

Why it matters: This is the model that will power GitHub Copilot going forward. The +16-point lead on SWE-Bench Pro (51.2% vs 35.2%) against Haiku 4.5 is significant. The 60% token savings on SWE-Bench Verified means faster, cheaper interactions. If you use VS Code with Copilot, you'll likely see this model automatically.

3. MAI-Image-2.5 — Text-to-Image & Image Editing

Two variants: MAI-Image-2.5 (maximum fidelity) and MAI-Image-2.5-Flash (faster, lower-cost).

Arena rankings:

Text-to-image: No. 3 on Arena leaderboard
Image editing: No. 2 on Arena leaderboard (ahead of Nano Banana 2.1 and GPT-Image-1.5)

Capabilities:

High-quality text rendering (benchmark improvement: +107 points over MAI-Image-2)
Complex visual reasoning — understands scene structure, lighting, scale, spatial relationships
Fine-grained localized edits (replace objects, update text, remove motion blur)
Face and identity consistency across edits
Cartoon, Anime & Fantasy generation (benchmark improvement: +90 points over MAI-Image-2)

Pricing (Foundry API):

MAI-Image-2.5 (max fidelity): $5/1M text input tokens, $8/1M image input tokens, $47/1M image output tokens
MAI-Image-2.5-Flash: $1.75/1M text input, $1.75/1M image input, $19.50/1M image output

Availability: On Foundry today; integrated into PowerPoint (image generation) and OneDrive (image editing)

Also on: OpenRouter, Fireworks, Baseten

Why it matters: The Arena ranking is the key indicator here. No. 2 for image editing means this is competitive with the best available models, period. The price-to-performance is strong — Flash at $19.50/1M outputs is significantly cheaper than many competitors' flagship tiers.

4. MAI Transcribe-1.5 — Speech-to-Text

The world's most accurate multilingual transcription model, according to Microsoft.

Specs: 43 languages (up from 25), SOTA Word Error Rate (WER)

Key benchmarks:

FLEURS: best-in-class WER across 43 languages (#1)
Artificial Analysis leaderboard: #3 for WER, #1 for accuracy × speed

Key features:

Speed: transcribes an hour of audio in under 15 seconds — up to 5x faster than Gemini 3.1, Scribe v2, GPT-4o-Transcribe
Keyword Biasing: domain-specific terminology improves WER by up to 30%
Optimized for noisy environments
43 languages supported (new: 18 languages added)

Keyword Biasing example: Without biasing, the model misheard names like "Sean" as "Oif" and "Niamh" as "Societal." With keyword biasing, accuracy dramatically improves for specialized vocabulary.

Coming soon: Diarization (multi-speaker identification), native streaming API for real-time transcription

Availability: Integrated into Copilot, Teams, GitHub, and Dynamics 365 Contact Centre. Available on Foundry.

Pricing: Described as "the fastest, most efficient and most cost-effective transcription model of any hyper-scaler" — exact pricing TBD

Why it matters: For enterprises doing meeting transcription, customer service call analytics, or content localization, 5x speed with 43 languages is a serious capability. The Keyword Biasing feature is particularly useful for domain-specific use cases like medical, legal, or technical transcription.

5. MAI-Voice-2 — Text-to-Speech

The most expressive TTS model Microsoft has built. 15 languages, zero-shot voice cloning, granular emotion control.

Specs: 15 languages, zero-shot voice cloning (5-60s reference audio)

Key features:

Granular emotion control via emotion tags (sad, whispered, excited, etc.)
Zero-shot voice prompting across all 15 languages
Speaker identity stability for long-form content (audiobooks, podcasts)
Code-switching support: Hindi-English and Spanish-English (matches how people actually speak)
Preferred over MAI-Voice-1 in 72% of side-by-side listening tests
In a 2,222-response test, 45.5% preferred Voice-2, 44% preferred real human recordings (10.5% tie)

Supported languages: English (US/Australia), Italian, French, German, Hindi, Spanish (Spain/Mexico), Portuguese (Brazil/Portugal), Korean, Chinese (Simplified), Turkish, Russian, Thai, Dutch, Romanian, Hungarian

Availability: On Foundry; integrated into VS Code and Dynamics 365 Contact Centre

Safety: Consent enforced at system level — only authorized, licensed voices can be synthesized in production. No unlicensed voice cloning.

Also on: MAI Playground

Pricing: TBD

Why it matters: The 45.5% vs 44% result (human recordings) in the side-by-side test is striking — synthetic speech is essentially indistinguishable from real human speech for nearly half of listeners. The code-switching capability is unique among major TTS models.

6 & 7. MAI-Voice-2-Flash and the Seventh Model?

Mustafa Suleyman's keynote announced "a family of seven" but the individual model pages detail five distinct models: MAI-Thinking-1, MAI-Code-1-Flash, MAI-Image-2.5 (with Flash variant), MAI Transcribe-1.5, and MAI-Voice-2 (with Flash variant coming soon).

The "seven" likely includes:

MAI-Thinking-1
MAI-Code-1-Flash
MAI-Image-2.5
MAI-Image-2.5-Flash
MAI Transcribe-1.5
MAI-Voice-2
MAI-Voice-2-Flash (announced as "coming soon")

MAI-Voice-2-Flash is described as "a lower cost, ultra-efficient package" — likely targeting high-volume production use cases where cost per character matters.

Microsoft Frontier Tuning: The Hidden Story

Beyond the individual models, the most strategically significant announcement is Microsoft Frontier Tuning — a reinforcement learning framework that lets organizations fine-tune MAI models on their own workflow data.

The claim is compelling: a tuned MAI model for Excel matched GPT 5.4 while being 10x more efficient. Early adopters saw "similar gains at the frontier" with the highest win rate of any tested model at roughly 10x lower cost.

This is essentially Microsoft's answer to the custom model trend — but instead of training from scratch, you're using reinforcement learning on real workplace traces. The data never leaves your environment.

Availability: Microsoft Foundry

How MAI Models Stack Up Against Competitors

Model Category	MAI	Top Competitor	MAI Advantage
Reasoning	MAI-Thinking-1 (35B active)	Claude Opus 4.6, Sonnet 4.6, GPT 5.4	Preferred over Sonnet 4.6 in blind tests; stronger at its weight class
Coding	MAI-Code-1-Flash (5B active)	Claude Haiku 4.5	+16 pts on SWE-Bench Pro; 60% fewer tokens
Image Generation	MAI-Image-2.5	DALL-E 3, Midjourney v7, Stable Diffusion 4	No. 3 Arena T2I; No. 2 Arena Edit
Speech-to-Text	MAI Transcribe-1.5	GPT-4o Transcribe, Gemini 3.1	5x faster; 43 languages vs ~10-20 for competitors
Text-to-Speech	MAI-Voice-2	ElevenLabs, OpenAI TTS	15 languages; code-switching; emotion control; 45.5% vs 44% human

Pricing Summary

Model	Pricing
MAI-Thinking-1	TBD (Foundry private preview)
MAI-Code-1-Flash	Comparable to Haiku, cheaper (via Copilot)
MAI-Image-2.5	$5/1M input text, $47/1M output images
MAI-Image-2.5-Flash	$1.75/1M input, $19.50/1M output
MAI Transcribe-1.5	TBD (competitive with hyper-scalers)
MAI-Voice-2	TBD
MAI-Voice-2-Flash	TBD (lower cost tier)

What This Means for the AI Landscape

The MAI launch is significant for several reasons:

Microsoft is building a full-stack AI lab. Not just one model, but a family covering reasoning, coding, image, speech-to-text, and text-to-speech — all sharing the same infrastructure and data standards.
No distillation policy. Training from scratch rather than inheriting from other labs' models is a bold stance. It means more steerability and long-term control, but potentially slower initial capability ramp.
Deep product integration. These aren't just API models — they're built directly into VS Code, GitHub Copilot, PowerPoint, OneDrive, Teams, and Dynamics 365. The distribution advantage is enormous.
Enterprise-focused positioning. Frontier Tuning, clean data lineage, consent guardrails, and compliance through Foundry — this is clearly aimed at enterprises that need auditability and control.
Third-party availability. Despite deep 1P integration, MAI models are also on OpenRouter, Fireworks, and Baseten — making them accessible to the broader developer ecosystem.

The "Humanist Superintelligence" framing is aspirational. But the models themselves are real and competitive — particularly MAI-Code-1-Flash for developers using GitHub Copilot, and MAI Transcribe-1.5 for enterprise meeting workflows.

Try It Yourself

MAI Playground: https://playground.microsoft.ai/chat
DuoAI (interactive demo with Voice 2 + Transcribe 1.5 + Image 2.5): https://playground.microsoft.ai/?model=duo-ai&duo-mode=conversation
Foundry API docs: Various model-specific documentation at learn.microsoft.com

Sources:

Microsoft AI: "Building a hill-climbing machine: Launching seven new MAI models" — https://microsoft.ai/news/building-a-hillclimbing-machine-launching-seven-new-mai-models/
Microsoft AI: "Introducing MAI-Thinking-1" — https://microsoft.ai/news/introducing-mai-thinking-1/
Microsoft AI: "Introducing MAI-Code-1-Flash" — https://microsoft.ai/news/introducingmai-code-1-flash/
Microsoft AI: "MAI-Image-2.5 launches at No. 2 for image editing" — https://microsoft.ai/news/introducing-mai-image-2-5/
Microsoft AI: "Introducing MAI-Transcribe-1.5" — https://microsoft.ai/news/mai-transcribe-1-5more-accurate-context-aware-and-built-for-production/
Microsoft AI: "Introducing MAI-Voice-2" — https://microsoft.ai/news/mai-voice-2/
DEV Community: "Latest AI Model Releases: June 2026 Roundup" — https://dev.to/vjswamy/latest-ai-model-releases-june-2026-roundup-49j5

Stay Ahead of AI

Bookmark AI Tools Insight for honest, data-driven AI reviews and comparisons. No hype, just what works.

Microsoft MAI Models: Seven New Models, Full Breakdown (June 2026)

The MAI Lab: A New Chapter for Microsoft AI

1. MAI-Thinking-1 — Flagship Reasoning Model

2. MAI-Code-1-Flash — Agentic Coding Model

3. MAI-Image-2.5 — Text-to-Image & Image Editing

4. MAI Transcribe-1.5 — Speech-to-Text

5. MAI-Voice-2 — Text-to-Speech

6 & 7. MAI-Voice-2-Flash and the Seventh Model?

Microsoft Frontier Tuning: The Hidden Story

How MAI Models Stack Up Against Competitors

Pricing Summary

What This Means for the AI Landscape

Try It Yourself

Stay Ahead of AI

Comments & Danmaku