MIRAGE: A Polarity-Flipping Encoding Subspace in LLM Agents

arXiv cs.CL·Pratibha Revankar, Kargi Chauhan, Jihye Kim, Sadiba Nusrat Nur, Vincent Siu, Chenguang Wang

6/10/2026

·~2 min·6/10/2026·en·1

Quick Answer

MIRAGE introduces a dual-channel monitoring system for LLM agents, achieving AUC = 0.918 in detecting covert data encoding across various models.

Quick Take

It exploits a low-dimensional encoding subspace, outperforming traditional output-only detection methods significantly, with false-positive rates varying from 0% to 100% depending on the model's geometry.

Key Points

MIRAGE monitors covert encoding in , achieving AUC = 0.918 on 126 scenarios.
Logistic-regression probe reads computation, not surface features, recovering encoding families effectively.
False-positive rates for benign encoding vary significantly based on model geometry.
Encoding fidelity is compromised when adversarial attacks suppress the encoding subspace.
The system distinguishes between inline encoding and tool call outsourcing strategies.

Paper Resources

Read Paperarxiv.org View PDFarxiv.org

Source Excerpt

arXiv:2606. 10304v1 Announce Type: new Abstract: When agents are coerced into covertly encoding sensitive data (Base64, ROT13, acrostic, synonym chains, and beyond), the resulting outputs evade output-side detection but the underlying computation does not. Across nine encoding families and eight models from five architecture families, that computation is supported by a shared low-dimensional encoding subspace in the residual stream.

A logistic-regression probe trained on eight encoding families recovers the held-out ninth at AUC 0. 975-1. 000, reading the computation rather than surface features. …

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.CL

See more →

arXiv cs.CL·Isabel Xu (The Overlake School), Cynthia Xu (The Overlake School), Rachel Ren (Edwards Vacuum Inc.), Cong Guo (The University of Memphis), Jiacheng Ding (The University of Memphis)

5d ago

FeaturedOriginal

TriAgent: Divergence-Aware Committees for Cost-Efficient Financial Sentiment Analysis

AI Summary

TriAgent introduces a cost-efficient multi-agent system for financial sentiment analysis, combining VADER, FinBERT, and Qwen2.5. It achieves an F1 score of ~0.87 with significant savings of $9.3M/year at a 10M-user scale compared to GPT-4o-mini, while also detecting hallucinations with an AUC of 0.90.

#LLM #Agent #AI Startup #Enterprise AI

MIRAGE: A Polarity-Flipping Encoding Subspace in LLM Agents

Quick Answer

Quick Take

Key Points

Paper Resources

Source Excerpt

Want this in your inbox every morning?

More from arXiv cs.CL

TriAgent: Divergence-Aware Committees for Cost-Efficient Financial Sentiment Analysis

RF-Agent: A Practical Framework for Building Language Agents for RFIC Design

Letting the Data Speak: Extracting Keywords from Crowdsourced Collections with AI

Quick Answer

Quick Take

Key Points

Paper Resources

Source Excerpt

Want this in your inbox every morning?

More from arXiv cs.CL

TriAgent: Divergence-Aware Multi-Agent Committees for Cost-Efficient Financial Sentiment Analysis

RF-Agent: A Practical Framework for Building Language Agents for RFIC Design

Letting the Data Speak: Extracting Keywords from Crowdsourced Collections with AI

TriAgent: Divergence-Aware Committees for Cost-Efficient Financial Sentiment Analysis