Review Arcade: On the Human Alignment and Gameability of LLM Reviews

arXiv cs.AI·Hans Ole Hatzel, Sebastian Steindl, Jan Strich

1d ago

·~1 min·5/29/2026·en·0

Quick Take

LLM-generated reviews for scientific papers show limited alignment with human reviews, varying by prompts and models. In a study of the 2025 ACL Rolling Review, authors using LLMs iteratively improved submissions, achieving up to a 35% increase in scores for some papers. Code for the study is available on GitHub.

Key Points

LLM reviews are being piloted by major conferences for scientific papers.
Alignment between LLM and human reviews varies significantly across different prompts.
Iterative draft-revise workflows using LLMs can enhance paper scores substantially.
Up to 35% of papers showed statistically significant score increases through LLM assistance.
The study's code is publicly available for further research.

Article Excerpt

From source RSS / original summary

arXiv:2605. 28897v1 Announce Type: new Abstract: LLM-generated reviews for scientific papers are gaining considerable traction and are even being officially piloted by major conferences. We have to assume that not only reviewers are using LLM-assistance, but also that authors use LLMs to revise their papers before submitting. In this work, we perform empirical experiments on papers from the 2025 ACL Rolling Review (ARR) to evaluate LLM reviews from both the author and the reviewer perspective.

First, we identify a limited alignment of LLM reviews with human ones. In the best-case scenario, the alignment is reasonable. However, we also find that LLM-human alignment varies substantially across prompts and models. Finally, we investigate the scenario in which the author uses an iterative draft-revise workflow to improve the submission according to the LLM review.

We find that this "gaming" of LLM reviews can be effective in specific scenarios, leading to a statistically significant increase of overall scores for up to 35\% of papers. We publish our code: https://github. com/uhh-hcds/reviewarcade.

Reader Mode unavailable (could not extract clean content).

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.AI

See more →

arXiv cs.AI·Tyler Akidau, Tyler Rockwood, Johannes Br\"uderl, Marc Millstone

1d ago

FeaturedOriginal

The Importance of Out-of-Band Metadata for Safe Autonomous Agents: The Redpanda Agentic Data Plane

AI Summary

The Redpanda Agentic Data Plane (ADP) introduces out-of-band metadata channels to enhance the safety of autonomous AI agents, ensuring secure data access and tamper-proof audit trails. This architecture mitigates risks associated with unpredictable AI behavior by enforcing governance throughout the agent lifecycle, demonstrated in a multi-agent trading system with strict data scoping and approval thresholds.

#Agent #Robotics #Security #Policy