
An AI model programmed nonstop for 19 days on a single MirrorCode task that cost $2,600 to run
Quick Answer
Epoch AI's MirrorCode benchmark reveals Claude Opus 4.7 as the leader with a 56% solve rate, reconstructing a 16,000-line toolkit in 14 hours.
Quick Take
Epoch AI's MirrorCode benchmark reveals Claude Opus 4.7 as the leader with a 56% solve rate, reconstructing a 16,000-line toolkit in 14 hours. Despite this, all models tested struggle with the most complex tasks, highlighting limitations in current AI capabilities. The single task consumed $2,600 over 19 days, raising questions about cost-effectiveness in AI development.
Key Points
- Claude Opus 4.7 achieved a 56% solve rate on the MirrorCode benchmark.
- The model rebuilt a 16,000-line toolkit in just 14 hours.
- All tested models failed on the most complex programming tasks.
- The single MirrorCode task cost $2,600 to run over 19 days.
- The results raise concerns about AI's cost-effectiveness in complex programming.
Article Excerpt
From source RSS / original summaryEpoch AI's new MirrorCode benchmark tests whether AI models can recreate complete programs without access to the original code. Claude Opus 4. 7 leads with a 56 percent solve rate, rebuilding a 16,000-line toolkit in just 14 hours. But every model tested still fails on the most complex tasks. The article An AI model programmed nonstop for 19 days on a single MirrorCode task that cost $2,600 to run appeared first on The Decoder.
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from The Decoder
See more →
Cursor announces its own AI model, a new Git platform, and a mobile app
Cursor has launched its first in-house AI model alongside a new Git platform and a mobile app, aiming to enhance developer productivity. The AI model is designed to streamline coding processes, while the Git platform offers improved version control features tailored for collaborative projects.

