โ„น๏ธ Unofficial Community Analysis

DeepSWE Benchmark Report

Comprehensive analysis of 19 AI coding agents across 113 real-world software engineering tasks โ€” 13,424 trials total (benchmark scores)

Disclaimer: This is an independent community-driven report created for interactive exploration of DeepSWE trial data. It is not officially affiliated with, sponsored by, or endorsed by Datacurve or the DeepSWE benchmark team. Mimo V2.5 was benchmarked independently, and Mimo V2.5 Pro pricing has been adjusted from the official benchmark values to reflect its recent permanent price drop.

๐Ÿ“Š Overview

19
Base Models
10
Providers
113
Tasks
13,424
Total Trials
32.9%
Overall Pass Rate
$64,697
Total Cost

Model Pass Rates โ€” Full Leaderboard

Provider Comparison (Pass Rate vs. Avg Cost)

๐Ÿ† Model Leaderboard

All 18 models ranked by pass rate, with cost, token usage, and timing metrics.

#ModelProviderFamily TrialsPassedPass Rate Avg CostCost/Pass Avg Input TokensAvg Output Tokens Avg StepsAvg Duration

๐Ÿ† Multi-Dimensional Rankings

Models ranked across 10 different dimensions โ€” click tabs to explore what matters most to your team.

๐Ÿ‘จโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ฆ Model Family Analysis

How model families compare in capability, cost-efficiency, and specialization.

Family Pass Rates (Best Model)

Family Cost Efficiency (Best Model)

๐Ÿ’ป Language Performance

How models perform across Python, TypeScript, Go, Rust, and JavaScript.

Pass Rate by Language โ€” Top Models

Language Distribution (113 tasks)

Pass Rate by Language โ€” All Models

ModelPythonTypeScriptGoRustJavaScript

๐Ÿ’ฐ Cost & Efficiency Analysis

โš ๏ธ Pricing Update: MiMo V2.5 Pro pricing has been updated to reflect a major price cut. New rates: $0.435/M input (was $1.00), $0.0036/M cached input (was $0.20), $0.870/M output (was $3.00). This reduces cost per pass from $10.20 to $0.69 (93% reduction) and cost per trial from $1.99 to $0.13.

Cost per Pass vs. Pass Rate (Value Quadrant)

Duration: Pass vs. Fail

Token Efficiency: Input Tokens vs. Pass Rate

Success Rate by Agent Steps

๐ŸŒก๏ธ Model ร— Task Heatmap

Pass rates for every model on every synthetic task (excluding SWE-bench instances). Scroll horizontally to see all models.

๐Ÿ“‹ Task Analysis

TaskLanguageRepositoryTrialsPassedPass Rate
TaskLanguageRepositoryTrialsPassedPass Rate
TaskLanguageRepositoryTrialsBest ModelBest RateWorst ModelWorst RateSpread

Impossible Tasks (0% pass rate across all models)

๐Ÿ” Detailed Model Profiles

Click any model to see its strengths, weaknesses, best/worst tasks, and family context.