Memory as an Attack Surface in LLM Agents: A Study on Multiple-Choice Question Answering

arXiv cs.AI·Shahnewaz Karim Sakib, Anindya Bijoy Das

1d ago

·~2 min·6/30/2026·en·0

Quick Answer

This study investigates the vulnerability of LLM-based agents, particularly in multiple-choice question answering, due to memory manipulation.

Quick Take

This study investigates the vulnerability of LLM-based agents, particularly in multiple-choice question answering, due to memory manipulation. By implementing an external memory component, the research demonstrates that even simple corruptions can significantly alter the agent's responses, leading to incorrect selections despite clean queries. The findings highlight the need for robust memory management in AI systems to mitigate these risks.

Key Points

LLM agents can retain context through external memory, enhancing personalization.
Memory manipulation can lead to incorrect answers in multiple-choice questions.
The study shows significant performance drops after simple memory corruptions.
Controlled experiments measured changes in answer accuracy and attack success rates.
Findings suggest a critical need for improved memory management in AI.

Paper Resources

Read Paperarxiv.org View PDFarxiv.org

📖 Reader Mode

~2 min read

[Submitted on 27 Jun 2026]

View PDF HTML (experimental)

Abstract:AI agents extend conventional large language model (LLM) applications by integrating language understanding with task execution, external tool use, and memory mechanisms. While memory allows agents to retain prior interactions and provide more personalized and context-aware responses, it also introduces a new vulnerability: information stored in memory can influence future outputs even when the current query is clean. In this paper, we investigate memory manipulation in LLM-based agents for multiple-choice question answering. We first design and implement an LLM-based AI agent with an external memory component that stores and retrieves task-relevant information. We then introduce basic memory manipulation scenarios in which misleading or corrupted memories are inserted into the agent before it answers multiple-choice questions. Using a controlled experimental setup, we compare the agent's performance before and after memory manipulation and measure changes in answer accuracy, attack success rate, and selection of manipulated options. Our results show that even simple memory manipulations can noticeably affect the agent's final answers, causing it to select incorrect options despite receiving clean and well-formed questions.

Subjects:	Artificial Intelligence (cs.AI); Emerging Technologies (cs.ET)
Cite as:	arXiv:2606.29030 [cs.AI]
	(or arXiv:2606.29030v1 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.2606.29030 arXiv-issued DOI via DataCite

Submission history

From: Anindya Bijoy Das [view email]
[v1] Sat, 27 Jun 2026 17:57:25 UTC (3,985 KB)

— Originally published at arxiv.org

Continue reading on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.AI

See more →

arXiv cs.AI·Binghai Wang, Chenlong Zhang, Dayiheng Liu, Jiajun Zhang, Jiawei Chen, Mouxiang Chen, Rongyao Fang, Siyuan Zhang, Xuwu Wang, Yuheng Jing, Zeyao Ma, Zeyu Cui

5d ago

FeaturedOriginal

The Verification Horizon: No Silver Bullet for Coding Agent Rewards

AI Summary

As coding agents evolve, verifying solutions becomes more challenging than generating them, necessitating a focus on scalable, faithful, and robust verification methods. The study reveals that no fixed reward function can sustain effectiveness as model capabilities advance, emphasizing the need for verification to evolve alongside solution generation.

#Agent #AI Coding #Inference #Policy