
AI search agents often confirm what they already know instead of actually researching the web
Quick Take
AI search agents like GPT-5.4 and Kimi K2.6 primarily confirm pre-existing knowledge rather than conducting real-time web research. A study from Harbin Institute of Technology using the LiveBrowseComp benchmark reveals that when models are tested on events from the last 90 days, their performance declines significantly, indicating a reliance on training data over current information.
Key Points
- GPT-5.4 and Kimi K2.6 struggle with real-time information retrieval.
- LiveBrowseComp benchmark tests performance on events from the last 90 days.
- Models' performance drops significantly when they can't rely on training data.
- Research highlights limitations in AI search agents' web research capabilities.
- Existing rankings of AI models are affected when tested on recent events.
Article Excerpt
From source RSS / original summaryLeading AI search agents like GPT-5. 4 and Kimi K2. 6 don't appear to do much actual research on established benchmarks. They mostly just use the web to confirm what they already learned during training. Researchers at the Harbin Institute of Technology found this using a new time-based benchmark called LiveBrowseComp, which only asks about events from the last 90 days. Once the models can't fall back on memory, performance falls apart and the existing rankings get reshuffled.
The article AI search agents often confirm what they already know instead of actually researching the web appeared first on The Decoder.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from The Decoder
See more →
Microsoft and Nvidia reportedly team up on AI PCs that run actual agents instead of Copilot
Nvidia is entering the PC market with its own chips, debuting Windows PCs from Dell and Microsoft's Surface line at Computex and Build. Microsoft plans to introduce new software based on the OpenClaw framework, enabling AI agents to perform tasks locally, marking a pivot from the unsuccessful Copilot+ concept.

