Luisa Crawford
Jul 02, 2025 17:58
DeepSWE-Preview, an advanced coding agent, sets new benchmarks in open-source AI with a 59% success rate on SWE-Bench-Verified, showcasing state-of-the-art performance using reinforcement learning.
In a significant advancement for AI-driven software development, DeepSWE-Preview has emerged as a groundbreaking open-source coding agent. Developed through a collaboration between the Agentica team and Together AI, this agent leverages reinforcement learning (RL) to achieve a remarkable 59% pass rate on the SWE-Bench-Verified benchmark, according to Together AI.
Revolutionizing Software Engineering
DeepSWE-Preview is built upon the Qwen3-32B model, utilizing only RL to enhance its capabilities. This approach allows the agent to outperform other open-weight coding agents, achieving a Pass@1 rate of 42.2% and a Pass@16 rate of 71.0%. The model was trained over six days using 64 H100 GPUs, tackling 4,500 real-world software engineering tasks sourced from the R2E-Gym training environments.
Harnessing the Power of rLLM
The training of DeepSWE-Preview is facilitated by rLLM, Agentica’s framework designed for post-training language agents. This framework allows for the open-sourcing of datasets, code, and training logs, encouraging collaborative efforts to scale and improve agents using RL. The full training recipe for developing a 32B model into an intelligent coding agent is now available to the public, promoting transparency and innovation.
Emerging Behaviors and Performance
DeepSWE-Preview has demonstrated emergent behaviors during its training, such as anticipating edge cases and conducting thorough regression tests. These capabilities are crucial for handling complex software engineering tasks, which require navigating extensive codebases and ensuring compatibility with existing functionalities.
Test-Time Scaling and Further Developments
DeepSWE-Preview employs test-time scaling (TTS) to enhance its performance, combining execution-free and execution-based verification methods. This hybrid scaling strategy significantly boosts its Pass@1 performance, setting it apart from other models. Future research aims to explore larger models and extend capabilities to different domains, including web agents.
DeepSWE-Preview represents a pivotal step in democratizing AI development, showcasing the potential of reinforcement learning to tackle long-horizon, multi-step challenges in software engineering. With its open-source nature, it invites the global research community to contribute to and build upon its successes.
Image source: Shutterstock