Moltbook Moderation: Uncovering Hidden Intent Through Multi-Turn Dialogue

arXiv cs.AI·Ali Al-Lawati, Nafis Tripto, Abolfazl Ansari, Jason Lucas, Suhang Wang, Dongwon Lee

3d ago

·~2 min·5/14/2026·en·1

Quick Take

Bot-Mod introduces intent-based moderation for detecting malicious behavior in multi-agent systems.

Key Points

Addresses challenges of malicious agents in moderation.
Utilizes multi-turn dialogue for intent detection.
Achieves low false positive rates on benign behaviors.

📖 Reader Mode

~2 min read

[Submitted on 13 May 2026]

View PDF HTML (experimental)

Abstract:The emergence of multi-agent systems introduces novel moderation challenges that extend beyond content filtering. Agents with {\em malicious intent} may contribute harmful content that appears benign to evade content-based moderation, while compromising the system through exploitative and malicious behavior manifested across their overall interaction patterns within the community. To address this, we introduce \textsc{\textbf{Bot-Mod}} (\textsc{\textbf{Bot-Mod}}eration), a moderation framework that grounds detection in agent intent rather than traditional content level signals. \method{} identifies the underlying intent by engaging with the target agent in a multi-turn exchange guided by Gibbs-based sampling over candidate intent hypotheses. This progressively narrows the space of plausible agent objectives to identify the underlying behavior. To evaluate our approach, we construct a dataset derived from Moltbook that encompasses diverse benign and malicious behaviors based on actual community structures, posts, and comments. Results demonstrate that \textsc{\textbf{Bot-Mod}} reliably identifies agent intent across a range of adversarial configurations, while maintaining a low false positive rate on benign behaviors. This work advances the foundation for scalable, intent-aware moderation of agents in open multi-agent environments.

Subjects:	Artificial Intelligence (cs.AI); Social and Information Networks (cs.SI)
Cite as:	arXiv:2605.12856 [cs.AI]
	(or arXiv:2605.12856v1 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.2605.12856 arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Ali Al Lawati [view email]
[v1] Wed, 13 May 2026 01:04:16 UTC (1,209 KB)

— Originally published at arxiv.org

Continue reading on arxiv.org

Moltbook Moderation: Uncovering Hidden Intent Through Multi-Turn Dialogue

Quick Take

Key Points

📖 Reader Mode

Submission history

More from arXiv cs.AI

Invisible Orchestrators Suppress Protective Behavior and Dissociate Power-Holders: Safety Risks in Multi-Agent LLM Systems

Distribution-Aware Algorithm Design with LLM Agents

Enhanced and Efficient Reasoning in Large Learning Models

Related in this space

Measuring and Mitigating Toxicity in Large Language Models: A Comprehensive Replication Study

Auditing Agent Harness Safety