Microsoft MAI Models: Seven New Models, Full Breakdown (June 2026)

PJ • 2026-06-05 • Microsoft MAI MAI-Thinking-1 MAI-Code-1-Flash MAI-Image-2.5 MAI-Transcribe MAI-Voice AI Models June 2026

Microsoft AI launched a family of seven new models on June 2, 2026, under a new lab umbrella called MAI (Microsoft AI). Announced by Mustafa Suleyman at Microsoft Build, these models cover reasoning, coding, image generation, speech-to-text, and text-to-speech.

Here's a complete breakdown of every model, with benchmarks, pricing, and what actually matters.

The MAI Lab: A New Chapter for Microsoft AI

The MAI lab was restructured from what was previously Microsoft's Superintelligence team. The new philosophy is summarized as "Humanist Superintelligence" — advanced AI designed to serve people and organizations, not replace them.

The lab's key differentiator is its "hill-climbing machine" approach: all models are trained from scratch on clean, commercially licensed data, without distillation from third-party models. Their flagship reasoning model, MAI-Thinking-1, explicitly trained without distillation — a departure from most frontier models that inherit capabilities from teacher models.

MAI models are available on Microsoft Foundry, OpenRouter, Fireworks, and Baseten. Three of the models (MAI-Code-1-Flash, MAI-Image-2.5, MAI-Voice-2) are also integrated into Microsoft's 1P products like GitHub Copilot, Visual Studio Code, PowerPoint, and OneDrive.


1. MAI-Thinking-1 — Flagship Reasoning Model

The flagship model of the MAI family. A sparse Mixture of Experts model designed for software engineering and mathematical reasoning.

Specs: 35B active parameters, ~1T total parameters

Key benchmarks:

What it's built for: Advanced mathematical reasoning, software engineering tasks, instruction following across multiple layers. Trained on clean, commercially licensed data with AI-generated content excluded from pre-training.

Context window: 256K tokens (enough for a 600-page document)

API: Chat Completions API compatible

Availability: Private preview on Microsoft Foundry; public preview on MAI Playground coming soon

Why it matters: At 35B active parameters, it's a medium-sized model that punches well above its weight. The smaller inference footprint means advanced coding assistance can be deployed more widely — not just for exceptional tasks but daily workflows. Being trained without distillation from third-party models is also notable — most competitors inherit intelligence from other labs' models, but MAI-Thinking-1 learns from the ground up.


2. MAI-Code-1-Flash — Agentic Coding Model

A 5B active parameter coding model built for speed and efficiency. Deeply integrated into GitHub Copilot and VS Code.

What it replaces: Claude Haiku 4.5

Key benchmarks (vs Claude Haiku 4.5):

Key features:

Availability: Rolling out to GitHub Copilot individual users in VS Code. Available in the model picker and under the default auto picker.

Pricing: Positioned as comparable to Haiku but cheaper — exact pricing not publicly disclosed for Copilot users. Available on OpenRouter, Fireworks, and Baseten for developers.

Why it matters: This is the model that will power GitHub Copilot going forward. The +16-point lead on SWE-Bench Pro (51.2% vs 35.2%) against Haiku 4.5 is significant. The 60% token savings on SWE-Bench Verified means faster, cheaper interactions. If you use VS Code with Copilot, you'll likely see this model automatically.


3. MAI-Image-2.5 — Text-to-Image & Image Editing

Two variants: MAI-Image-2.5 (maximum fidelity) and MAI-Image-2.5-Flash (faster, lower-cost).

Arena rankings:

Capabilities:

Pricing (Foundry API):

Availability: On Foundry today; integrated into PowerPoint (image generation) and OneDrive (image editing)

Also on: OpenRouter, Fireworks, Baseten

Why it matters: The Arena ranking is the key indicator here. No. 2 for image editing means this is competitive with the best available models, period. The price-to-performance is strong — Flash at $19.50/1M outputs is significantly cheaper than many competitors' flagship tiers.


4. MAI Transcribe-1.5 — Speech-to-Text

The world's most accurate multilingual transcription model, according to Microsoft.

Specs: 43 languages (up from 25), SOTA Word Error Rate (WER)

Key benchmarks:

Key features:

Keyword Biasing example: Without biasing, the model misheard names like "Sean" as "Oif" and "Niamh" as "Societal." With keyword biasing, accuracy dramatically improves for specialized vocabulary.

Coming soon: Diarization (multi-speaker identification), native streaming API for real-time transcription

Availability: Integrated into Copilot, Teams, GitHub, and Dynamics 365 Contact Centre. Available on Foundry.

Pricing: Described as "the fastest, most efficient and most cost-effective transcription model of any hyper-scaler" — exact pricing TBD

Why it matters: For enterprises doing meeting transcription, customer service call analytics, or content localization, 5x speed with 43 languages is a serious capability. The Keyword Biasing feature is particularly useful for domain-specific use cases like medical, legal, or technical transcription.


5. MAI-Voice-2 — Text-to-Speech

The most expressive TTS model Microsoft has built. 15 languages, zero-shot voice cloning, granular emotion control.

Specs: 15 languages, zero-shot voice cloning (5-60s reference audio)

Key features:

Supported languages: English (US/Australia), Italian, French, German, Hindi, Spanish (Spain/Mexico), Portuguese (Brazil/Portugal), Korean, Chinese (Simplified), Turkish, Russian, Thai, Dutch, Romanian, Hungarian

Availability: On Foundry; integrated into VS Code and Dynamics 365 Contact Centre

Safety: Consent enforced at system level — only authorized, licensed voices can be synthesized in production. No unlicensed voice cloning.

Also on: MAI Playground

Pricing: TBD

Why it matters: The 45.5% vs 44% result (human recordings) in the side-by-side test is striking — synthetic speech is essentially indistinguishable from real human speech for nearly half of listeners. The code-switching capability is unique among major TTS models.


6 & 7. MAI-Voice-2-Flash and the Seventh Model?

Mustafa Suleyman's keynote announced "a family of seven" but the individual model pages detail five distinct models: MAI-Thinking-1, MAI-Code-1-Flash, MAI-Image-2.5 (with Flash variant), MAI Transcribe-1.5, and MAI-Voice-2 (with Flash variant coming soon).

The "seven" likely includes:

  1. MAI-Thinking-1
  2. MAI-Code-1-Flash
  3. MAI-Image-2.5
  4. MAI-Image-2.5-Flash
  5. MAI Transcribe-1.5
  6. MAI-Voice-2
  7. MAI-Voice-2-Flash (announced as "coming soon")

MAI-Voice-2-Flash is described as "a lower cost, ultra-efficient package" — likely targeting high-volume production use cases where cost per character matters.


Microsoft Frontier Tuning: The Hidden Story

Beyond the individual models, the most strategically significant announcement is Microsoft Frontier Tuning — a reinforcement learning framework that lets organizations fine-tune MAI models on their own workflow data.

The claim is compelling: a tuned MAI model for Excel matched GPT 5.4 while being 10x more efficient. Early adopters saw "similar gains at the frontier" with the highest win rate of any tested model at roughly 10x lower cost.

This is essentially Microsoft's answer to the custom model trend — but instead of training from scratch, you're using reinforcement learning on real workplace traces. The data never leaves your environment.

Availability: Microsoft Foundry


How MAI Models Stack Up Against Competitors

Model Category MAI Top Competitor MAI Advantage
Reasoning MAI-Thinking-1 (35B active) Claude Opus 4.6, Sonnet 4.6, GPT 5.4 Preferred over Sonnet 4.6 in blind tests; stronger at its weight class
Coding MAI-Code-1-Flash (5B active) Claude Haiku 4.5 +16 pts on SWE-Bench Pro; 60% fewer tokens
Image Generation MAI-Image-2.5 DALL-E 3, Midjourney v7, Stable Diffusion 4 No. 3 Arena T2I; No. 2 Arena Edit
Speech-to-Text MAI Transcribe-1.5 GPT-4o Transcribe, Gemini 3.1 5x faster; 43 languages vs ~10-20 for competitors
Text-to-Speech MAI-Voice-2 ElevenLabs, OpenAI TTS 15 languages; code-switching; emotion control; 45.5% vs 44% human

Pricing Summary

Model Pricing
MAI-Thinking-1 TBD (Foundry private preview)
MAI-Code-1-Flash Comparable to Haiku, cheaper (via Copilot)
MAI-Image-2.5 $5/1M input text, $47/1M output images
MAI-Image-2.5-Flash $1.75/1M input, $19.50/1M output
MAI Transcribe-1.5 TBD (competitive with hyper-scalers)
MAI-Voice-2 TBD
MAI-Voice-2-Flash TBD (lower cost tier)

What This Means for the AI Landscape

The MAI launch is significant for several reasons:

  1. Microsoft is building a full-stack AI lab. Not just one model, but a family covering reasoning, coding, image, speech-to-text, and text-to-speech — all sharing the same infrastructure and data standards.

  2. No distillation policy. Training from scratch rather than inheriting from other labs' models is a bold stance. It means more steerability and long-term control, but potentially slower initial capability ramp.

  3. Deep product integration. These aren't just API models — they're built directly into VS Code, GitHub Copilot, PowerPoint, OneDrive, Teams, and Dynamics 365. The distribution advantage is enormous.

  4. Enterprise-focused positioning. Frontier Tuning, clean data lineage, consent guardrails, and compliance through Foundry — this is clearly aimed at enterprises that need auditability and control.

  5. Third-party availability. Despite deep 1P integration, MAI models are also on OpenRouter, Fireworks, and Baseten — making them accessible to the broader developer ecosystem.

The "Humanist Superintelligence" framing is aspirational. But the models themselves are real and competitive — particularly MAI-Code-1-Flash for developers using GitHub Copilot, and MAI Transcribe-1.5 for enterprise meeting workflows.


Try It Yourself


Sources:

Stay Ahead of AI

Bookmark AI Tools Insight for honest, data-driven AI reviews and comparisons. No hype, just what works.

Subscribe

Comments & Danmaku

Leave a comment — it flies across the page as danmaku!