AGORA: Adapter-Grounded Observation-Action Retention for Inference-Free Prompt Compression in LLM Agents

arXiv cs.AI·Haoran Zhang, Zhaohua Sun

5/27/2026

·~2 min·5/27/2026·en·5

Quick Answer

AGORA presents a novel approach for LLM agents, revealing that traditional token-level extractive compressors yield only 75% of uncompressed performance across various environments and methods.

Quick Take

AGORA presents a novel approach for LLM agents, revealing that traditional token-level extractive compressors yield only 75% of uncompressed performance across various environments and methods. A four-way component ablation study identifies structural limitations as the key factor, achieving up to 11.5x adaptive end-to-end compression with a fixed keep ratio.

Key Points

Traditional token-level compressors underperform for LLM agents, averaging 75% of uncompressed performance.
A four-way ablation study highlights structural limitations as the main quality lever.
AGORA achieves 1.0-11.5x adaptive end-to-end compression with a fixed keep ratio.
The study spans 17 (env, backbone, method) cells across two token-level method families.
Only one cell achieved 73% performance, indicating widespread inefficiency.

Paper Resources

Read Paperarxiv.org View PDFarxiv.org

📖 Reader Mode

~2 min read

[Submitted on 26 May 2026]

View PDF HTML (experimental)

Abstract:The token-level extractive compressors widely used for general LM context are structurally inappropriate for LLM agents: across 17 (env, backbone, method) cells spanning two independent token-level method families, every cell collapses to mean reward <= 0.05 despite 1.3-13.3x realized compression. We name and characterize this failure mode as action-grammar destruction -- the tokens carrying action semantics (identifiers, brackets, action verbs) are exactly those self-information ranks lowest, so a general-purpose compressor reliably removes them and the environment rejects the residual. The diagnosis points to step-granularity compression. We introduce AGORA, an inference-free step-level compressor combining a structural prompt parser, an always-keep floor for format- and recency-critical content, and a 125M-parameter relevance scorer trained on counterfactual next-action-change labels (~2ms/step, zero per-step LLM toll). Across the compared inference-free and LLM-based methods, AGORA is the only one retaining >= 75% uncompressed performance in 8 of 9 cells (with the lone exception at 73%); a four-way component ablation isolates the structural floor as the dominant quality lever and the learned scorer as the source of 1.0-11.5x adaptive end-to-end compression from a single fixed keep ratio.

Comments:	10 pages, 2 figures. Code and data: this https URL
Subjects:	Artificial Intelligence (cs.AI)
Cite as:	arXiv:2605.26596 [cs.AI]
	(or arXiv:2605.26596v1 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.2605.26596 arXiv-issued DOI via DataCite

Submission history

From: Haoran Zhang [view email]
[v1] Tue, 26 May 2026 06:29:44 UTC (4,379 KB)

— Originally published at arxiv.org

Continue reading on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.AI

See more →

arXiv cs.AI·Mihnea C. Moldoveanu, Joel A. C. Baum

1d ago

FeaturedOriginal

Adversarial Social Epistemology for Assemblies of Humans and Large Language Models

AI Summary

The paper introduces Adversarial Social Epistemology (ASE) to analyze how agents manipulate trust in public communications, highlighting mechanisms that undermine the reliability of testimony and inference. It critiques existing frameworks like epistemic bubbles and misinformation diffusion, proposing a new language for understanding trust breaches and auditing inferential chains in densely interactive environments involving humans and large language models.

#LLM #Agent #Inference #Policy

AGORA: Adapter-Grounded Observation-Action Retention for Inference-Free Prompt Compression in LLM Agents

Quick Answer

Quick Take

Key Points

Paper Resources

📖 Reader Mode

Submission history

Want this in your inbox every morning?

More from arXiv cs.AI

Adversarial Social Epistemology for Assemblies of Humans and Large Language Models

Information Limits and Attractor Dynamics in Economies of Frontier LLM Agents: A Pre-Registered Test

Onnes: A Physics-Grounded LLM Simulator for Cryogenic Fault Diagnosis in Quantum Computing Infrastructure

Quick Answer

Quick Take

Key Points

Paper Resources

📖 Reader Mode

Submission history

Want this in your inbox every morning?

More from arXiv cs.AI

Adversarial Social Epistemology for Assemblies of Humans and Large Language Models

Information Limits and Attractor Dynamics in Economies of Frontier LLM Agents: A Pre-Registered Test

Onnes: A Physics-Grounded Multi-Agent LLM Simulator for Cryogenic Fault Diagnosis in Quantum Computing Infrastructure

Onnes: A Physics-Grounded LLM Simulator for Cryogenic Fault Diagnosis in Quantum Computing Infrastructure