Rio de Janeiro Built an AI Model That Beat DeepSeek—But Was Based on Someone Else’s Work – Decrypt


Rio de Janeiro Built an AI Model That Beat DeepSeek—But Was Based on Someone Else’s Work – Decrypt


In brief

  • IplanRIO released Rio 3.5 Open 397B on June 13, billing it as a government-built frontier AI model with benchmark scores topping Qwen 3.7 Plus.
  • AI company Nex published a mathematical proof showing the model is a direct 0.6 Nex / 0.4 Qwen weight merge.
  • IplanRIO updated the model card, credited Nex, pulled the benchmark claims, and blamed an “incorrect upload.”

Rio de Janeiro’s IplanRIO released Rio 3.5 on June 13. The city’s IT agency called it a frontier-class model: 397 billion parameters, with a permissive open-source license, built by the municipal government of a city in the Global South.

Rio 3.5’s launch timing was perfect: Brazil was playing its World Cup opener, and social media was already on fire. Comments about it rapidly spread from Brazil to beyond.

But just as quickly as it gained attention, there was a dispute over who exactly created the model.

The original model card described Rio 3.5 as a post-train of Qwen 3.5 397B, Alibaba’s open-base model, with a new reasoning layer called SwiReasoning added on top. The development cost was reported at R$500,000 (Rio didn’t confirm this), or nearly $100,000 USD—roughly 30 times cheaper than equivalent off-the-shelf AI systems.

The architecture is Mixture-of-Experts, which means only around 17 billion of the 397 billion parameters fire on any given token. That makes inference cheaper than the headline size suggests. The model also supports vision and text, handles over a dozen languages, and ships under a fully open MIT license.

SwiReasoning is the technical centerpiece. It’s a training-free inference framework that switches dynamically between two modes. When the model is confident about a next word—low entropy in the probability distribution—it reasons in plain language. When uncertain, it shifts to latent reasoning, thinking in hidden internal states without emitting tokens. IplanRIO said Rio 3.5 was specifically trained to exploit this, and that the gains show up in the benchmark numbers.

The self-reported numbers were eye-catching. Terminal-Bench 2.1—which measures autonomous terminal command execution, scored as percentage of tasks passed—came in at 70.8% for Rio 3.5, edging out Qwen 3.7 Plus at 70.3% and the powerful DeepSeek v4 Pro at 67.9%.

On IMOAnswerBench, a math olympiad benchmark scored as percentage correct, Rio 3.5 hit 89.5%. On HLE—Humanity’s Last Exam, a near-unsolvable multi-domain expert battery scored as a percentage—Rio 3.5 landed at 36.5%, ahead of Qwen 3.7 Plus’s 34.7%.

A municipal government beating the most important flagship models on the most meaningful quality benchmarks: That’s the headline that spread, especially after the Mayor of Rio de Janeiro tweeted about it.

“An open AI model trained in Rio and publicly funded over the last year by [the Municipality of Rio] has just surpassed all other models,” Eduardo Cavaliere wrote. “Today, the world is talking about an open AI model trained in Rio.”

Then Nex showed up

“Trained in Rio” proved to be not entirely accurate.

Nex-AGI, a Shanghai-based open-source AI alliance, posted on X days after the release. The opener: “The Rio 3.5 model broke the internet this week. The plot twist? It’s essentially our open-source model, Nex N2 Pro, wearing a different hat.”

They’d analyzed the weights. The math was exact: Rio 3.5 ≈ 0.6 × Nex N2 Pro + 0.4 × Qwen 3.5. A verification script and a full GitHub report followed.

The evidence came in two parts.

First, behavioral. Nex stripped the hardcoded “You are Rio” system prompt from the deployed model and sent it 120 identity questions. Without the mask, Nex reports the model called itself “Nex, from Nex-AGI” 79.2% of the time. It called itself “Rio” exactly 0% of the time. Nex said the model also recited the company’s specific backstory verbatim, mentioning the “Shanghai Innovation Institute” and “a large-model ecosystem alliance.” That’s Nex’s own training data, surfacing in someone else’s model.

Second, mathematical. In a genuine weight merge, every parameter in the new model sits on a straight line between the two source models. Nex measured this collinearity across all 60 layers. The result came back at 0.993. Two unrelated models in the same parameter space scored near-zero by chance. Hitting 0.993 across every single layer isn’t a coincidence. The mixing ratio held at α ≈ 0.571, stable to three decimal places.

Basically, it was nearly 60% Nex, with the rest being the base Qwen model.

“Every weight tensor in Rio is, to thousands of standard deviations, the same 0.6/0.4 blend of Nex and Qwen—across all 60 layers and every component of the network,” Nex wrote. “There is no innocent explanation.”

Source: Nex Ecosystem
Source: Nex Ecosystem

The numbers also told a quieter story. Nex N2 Pro, released just days before Rio 3.5, scores 75.3% on Terminal-Bench 2.1—higher than Rio’s 70.8%. On GDPval, an economic forecasting benchmark scored as an Elo-style rating, Nex sits at 1,585 against Rio’s 1,533. If Rio is 60% Nex, then you’d expect it to score below Nex on Nex’s own benchmarks. It does.

Source: Nex Ecosystem
Source: Nex Ecosystem

IplanRIO responds

IplanRIO updated the Hugging Face model card—the benchmark table came down and the attribution changed.

“The model is built via a merge of nex-agi/Nex-N2-Pro and Qwen/Qwen3.5-397B-A17B, preceded by On-Policy Distillation from a stronger model,” the updated Readme says. “We detected an incorrect upload in the previous version, where the base merged version was uploaded instead of the final distilled model. We are sorry for the confusion and apologize profusely.”

No other public statement from IplanRIO has come out. Nex is now credited.

The “incorrect upload” explanation is the key claim. IplanRIO says the intended release was a distilled version of the merged base—not the raw merge itself. On-policy distillation means a stronger teacher model generates outputs, and the student trains on those while also generating its own. It’s more expensive than a raw merge, but still cheaper than training from scratch. If that step was real, then it would represent at least some original work on top of the merge.

What actually shipped, per IplanRIO, was the merged base with nothing on top.

Community observers split on what that means. Tech commentator Rafael Quintanilha gave the charitable read: Since Nex N2 Pro is itself built on Qwen, the team may have credited the underlying architecture and left it there. He also pointed out the model went viral during a World Cup match, “not necessarily ‘ready for public consumption.'”

Developer and AI YouTuber Lucas Montano noted that “merging two ~400B-class models and then applying policy distillation isn’t trivial”—while acknowledging both a technical error and a communication failure.

AI researcher Diego Ambrosio was less generous. The original launch described Rio 3.5 as the result of “autonomous post-training and proprietary fine-tuning”—framing that implied original research, not a merge.

Legal? Yes. Ethical? Well…

Model merging is completely legal. Nex N2 Pro is Apache 2.0—you can use it, modify it, and redistribute it, as long as you credit it. Qwen 3.5 is openly licensed too. Nobody’s going to court. here.

The problem was presenting the output as independently developed work without naming all the source models. The open-source community has seen this before. Earlier this year, Cursor’s Composer 2 was found to be built on Moonshot’s Kimi K2.5 without disclosure. The backlash was fast and reputational—no lawyers, just screenshots.

Building on existing open models is normal. As Decrypt has covered, stacking and merging open weights is practically its own subculture. The norm isn’t “don’t build on others’ work.” The norm is: Say what you used.

What made this louder than a typical attribution miss was the institutional wrapper. A pseudonymous developer shipping a frankenmerge under their own name is one thing. A municipal government using it to claim public-sector AI sovereignty—during the World Cup—is another. “It was a waste of resources,” one Brazilian commentator wrote.

Nex didn’t make it a war. “We are flattered that the City of Rio used our work to achieve SOTA performance,” the company wrote on X. “But in the open-source world, attribution matters.”

IplanRIO is working to upload the corrected, distilled model with full attribution in place. When that lands, the same checks will run again—and the community will find out whether the distillation actually changed anything, or whether it’s still mostly Nex with a different system prompt.

Daily Debrief Newsletter

Start every day with the top news stories right now, plus original features, a podcast, videos and more.





Source link