Microsoft’s Free AI Just Beat OpenAI and Google at Browsing the Web – Decrypt

In brief

Fara1.5-27B scored 72% on Online-Mind2Web, beating OpenAI Operator (58.3%) and Gemini 2.5 Computer Use (57.3%).
The models are open-weight, come in 4 billion, 9 billion, and 27 billion parameter sizes, and are built on fine-tuned Qwen 3.5.
Fara1.5-9B is live now on Azure AI Foundry; 4B and 27B arrive shortly.

Imagine telling your computer to look up vacation rentals, compare five sites, fill out the booking form, and confirm the one closest to the beach. You go make coffee. It’s done when you get back. That is the promise of “computer use agents”—AI that reads your browser screen and clicks, scrolls, and types exactly as a human would, with no special plugins required.

OpenAI tried this first with Operator, launched in January 2025 at $200 a month before being folded into ChatGPT Agent and shut down in August. Google has Gemini 2.5 Computer Use. Both are proprietary, cloud-based, and expensive to run.

This week, Microsoft Research released a tiny model named Fara1.5—and on the benchmarks that count, it beats them both.

The family comes in three sizes: 4 billion, 9 billion, and 27 billion parameters, all built on Qwen3.5, an Alibaba base model that Microsoft fine-tuned for browser work, with all weights publicly released. (Parameters are what determine an AI model’s breadth of knowledge, with more generally meaning a higher capacity.)

Getting there required rethinking the whole development process from scratch. “We started with a simple question: What does it take to make a small model genuinely good at agentic tasks?” the AI Frontiers team wrote. “The answer spanned the full lifecycle—data generation, training objectives, model design, and orchestration had to be redesigned together rather than in isolation.”

The benchmarks

Online-Mind2Web is the benchmark that matters in the task Microsoft wanted to excel. It tests how often an AI agent correctly completes 300 diverse, real-world tasks across 136 popular live websites—things like comparing products, filling forms, and booking services—scored as a percentage of tasks finished correctly on the actual, changing internet.

Fara1.5-27B scored 72%. OpenAI Operator scored 58.3%. Google’s Gemini 2.5 Computer Use scored 57.3%. Yutori’s Navigator n1, the top proprietary alternative, reached 64.7%. Even Fara1.5-9B, the mid-sized model, hit 63.4%—ahead of both OpenAI and Google.

Open-source rivals also fell short. Alibaba’s GUI-Owl-1.5 at 8 billion parameters scored 48.6%. AI2’s MolmoWeb scored 35.3%. Microsoft’s own previous model, Fara-7B, scored 34.1%—making this release nearly double its predecessor at a comparable size.

On WebVoyager, a second benchmark measuring task success on the live web scored the same way, Fara1.5-27B hit 88.6%, edging OpenAI Operator’s 87.0% and beating H Company’s 30-billion-parameter Holo2 at 83.0%.

How it learned

The secret sauce is the training pipeline. Microsoft used a system called FaraGen1.5 to generate the training data. Here’s the clever part: they used GPT-5.4—OpenAI’s model—as a “teacher agent” to demonstrate how to complete browser tasks. Those demonstrations become the training data for Fara1.5. You’re essentially using OpenAI’s most capable model to train a rival open-source one.

They also created six fake, fully functional replicas of real websites—email clients, calendars, marketplaces—so the model could practice tasks that require logins or irreversible actions (like actually sending an email or booking a flight) without touching real accounts. That’s called synthetic domain training, and it’s a significant part of why Fara1.5 handles “gated” tasks better than its predecessors.

Every model is designed to stop and ask before doing something it cannot undo. “Balancing robust safeguards such as Critical Points with seamless user journeys is key,” Yash Lara, Senior PM Lead at Microsoft Research, told VentureBeat. “Having a UI, like Microsoft Research’s Magentic-UI, is vital for giving users opportunities to intervene when necessary, while also helping to avoid approval fatigue.”

That matters because OpenAI was not subtle about the risks when it launched ChatGPT Agent. “When you sign ChatGPT agent into websites or enable connectors, it will be able to access sensitive data from those sources, such as emails, files, or account information,” the company wrote.

Fara1.5 runs everything through MagenticLite, a sandboxed browser environment that logs every action and lets users halt the agent at any point.

Browser AI has become a crowded race—Google’s Gemini in Chrome, Perplexity’s Comet, Anthropic’s Claude for Chrome. Fara1.5’s edge is that it is open: public weights, open inference code on GitHub, runs on hardware you control. Fara1.5-9B is live now on Azure AI Foundry; the 4B and 27B variants arrive shortly. Microsoft says it plans to expand Fara1.5 beyond the browser and into desktop and enterprise software next.

Daily Debrief Newsletter

Start every day with the top news stories right now, plus original features, a podcast, videos and more.

Source link