What Drives Interactive Improvement from Feedback?

arXiv cs.AI·Bart{\l}omiej Cupia{\l}, Jan {\L}ojek, Miko{\l}aj Garstecki, Szymon Pob{\l}ocki, Alicja Ziarko, Piotr Mi{\l}o\'s

12h ago

·~2 min·7/1/2026·en·0

Quick Answer

Quick Take

The study reveals that multi-turn language agents show limited improvement from self-generated feedback compared to strong external feedback, emphasizing the importance of the student's ability to act on feedback. The controlled evaluation across models like Omni-MATH and Codeforces indicates that feedback must provide specific guidance to enhance performance effectively.

Key Points

Multi-turn improvement often lacks evidence of effective feedback use.
Self-generated feedback adds minimal value compared to unguided self-refinement.
Strong external teachers yield significantly larger feedback-specific gains.
Student's ability to utilize feedback is crucial for interactive improvement.
Controlled evaluation framework is available at https://j-lojek.github.io/feedback-generation-is-a-bottleneck/

Paper Resources

Read Paperarxiv.org View PDFarxiv.org

Article Content

From source RSS / original summary

arXiv:2606. 30774v1 Announce Type: new Abstract: We study when natural-language feedback produces improvement beyond the gains obtainable from repeated attempts alone. In multi-turn language agent setting, higher final accuracy can reflect useful feedback, but it can also arise from resampling, format correction, or additional test-time computation.

To separate these effects, we introduce a controlled student-teacher protocol across Omni-MATH, Codeforces, BBEH Linguini, and ARC-AGI1, evaluating thirteen open-weight models in both student and teacher roles. We compare external feedback, self-feedback, and unguided self-refinement, while varying interaction history, task difficulty, and teacher access to privileged task information.

Across settings, we find that multi-turn improvement is often not evidence of feedback use: self-generated feedback adds little beyond unguided self-refinement, whereas the strongest external teachers produce substantially larger feedback-specific gains, suggesting that useful feedback must provide guidance beyond generic retry.

Dense student-teacher interaction matrices further show that interactive gains are driven more by the student's ability to use feedback than by the teacher's identity, although teacher choice remains important for a fixed student. These results suggest that feedback-based agents should be evaluated against repeated-attempt baselines, and that ability to act on feedback, not merely feedback availability, is a central bottleneck for interactive improvement.

We release our controlled student-teacher evaluation framework at https://j-lojek. github. io/feedback-generation-is-a-bottleneck/.

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.AI

See more →

arXiv cs.AI·Binghai Wang, Chenlong Zhang, Dayiheng Liu, Jiajun Zhang, Jiawei Chen, Mouxiang Chen, Rongyao Fang, Siyuan Zhang, Xuwu Wang, Yuheng Jing, Zeyao Ma, Zeyu Cui

5d ago

FeaturedOriginal

The Verification Horizon: No Silver Bullet for Coding Agent Rewards

AI Summary

As coding agents evolve, verifying solutions becomes more challenging than generating them, necessitating a focus on scalable, faithful, and robust verification methods. The study reveals that no fixed reward function can sustain effectiveness as model capabilities advance, emphasizing the need for verification to evolve alongside solution generation.

#Agent #AI Coding #Inference #Policy