New benchmark exposes how badly AI struggles with real knowledge work

The Decoder·Maximilian Schreiner

6/19/2026

·~1 min·6/19/2026·en·0

Quick Answer

A new benchmark reveals that even top AI models, like those from leading companies, only solve 3% of realistic knowledge work tasks.

Quick Take

This stark performance gap highlights the limitations of current AI technologies in practical applications, affecting industries reliant on knowledge work.

Key Points

Top AI models struggle with realistic knowledge work, solving only 3% of tasks.
The benchmark highlights significant limitations in AI's practical applications.
Industries relying on knowledge work may face challenges due to AI performance.
Current AI technologies are not yet equipped for complex knowledge tasks.

Source Excerpt

Even the best AI model fails at realistic knowledge work, fully solving just 3 percent of tasks.

Read the full article on the-decoder.com

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from The Decoder

See more →

An AI model programmed nonstop for 19 days on a single MirrorCode task that cost $2,600 to run

The Decoder·Matthias Bastian

6/26/2026

FeaturedOriginal

An AI model programmed nonstop for 19 days on a single MirrorCode task that cost $2,600 to run

AI Summary

Epoch AI's MirrorCode benchmark reveals Claude Opus 4.7 as the leader with a 56% solve rate, reconstructing a 16,000-line toolkit in 14 hours. Despite this, all models tested struggle with the most complex tasks, highlighting limitations in current AI capabilities. The single task consumed $2,600 over 19 days, raising questions about cost-effectiveness in AI development.

#LLM #AI Coding #Inference #AI Startup