Can Multi-Agent LLMs Identify Their Peers? Stylometric Fingerprinting in Role-Constrained Political Analysis
Quick Answer
This study reveals that multi-agent LLMs, including Claude Sonnet 4.6 and Llama-3.3-70B, cannot effectively anonymize model identity in political analysis, as demonstrated by T5-base achieving a Macro F1 score of 0.991 under a new statement-disjoint cross-validation protocol.
Quick Take
This study reveals that multi-agent LLMs, including Claude Sonnet 4.6 and Llama-3.3-70B, cannot effectively anonymize model identity in political analysis, as demonstrated by T5-base achieving a Macro F1 score of 0.991 under a new statement-disjoint cross-validation protocol. The findings highlight significant implications for compliance with the EU AI Act and quality-critical multi-agent deployments.
Key Points
- T5-base achieved a Macro F1 score of 0.991 under statement-disjoint cross-validation.
- Anonymization fails to neutralize model identity signals in political analysis.
- Study evaluates three classifiers: Claude Sonnet 4.6, Llama-3.3-70B, and fine-tuned T5-base.
- Findings impact compliance with EU AI Act Articles 13, 14, and 26.
- Performance knee identified at 40% of training data (~440 texts).
Paper Resources
Article Content
From source RSS / original summaryarXiv:2606. 09854v1 Announce Type: new Abstract: Multi-agent large language model (LLM) pipelines for political statement analysis are vulnerable to peer-preservation bias: models tend to protect peer models from deactivation and show identity-dependent scoring distortions. Prompt-level anonymization was proposed as a mitigation, but prior work simultaneously documented that stylometric fingerprints survive anonymization in role-constrained outputs - raising the question of whether this mitigation is sufficient.
This paper provides the first systematic investigation of whether LLMs can identify the model family behind political analysis texts under anonymization conditions. We evaluate three classifier approaches - LLM zero-shot and few-shot (Claude Sonnet 4. 6 and Llama-3. 3-70B) and a fine-tuned T5-base model - on a five-class attribution task covering four commercial LLM families and an open-world 'unknown' class. We introduce a statement-disjoint cross-validation protocol (SD-CV; defined in Section 3.
5) that guarantees no content overlap between training and validation data, and contrast it with a run-disjoint baseline (RD-CV). T5 achieves Macro F1 = 0. 991 (+-0. 008) under SD-CV and F1 = 0. 978 on 24 completely held-out statements - robust despite a 2. 1x increase in train-test content distance versus RD-CV (0. 767 vs. 0. 366, p<0. 001), demonstrating genuine stylometric generalization. A fractional SD-CV analysis identifies a performance knee at 40% of training data (~440 texts).
Our findings confirm that prompt-level anonymization alone cannot neutralize model identity signals, with direct implications for EU AI Act compliance (Articles 13, 14, 26) and for computer system validation (CSV) in quality-critical multi-agent deployments.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CL
See more →Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?
The REFLECT benchmark reveals that current LLM judges are unreliable, achieving below 55% accuracy in evaluating reasoning and evidence use, highlighting the need for improved evaluation methods for deep research agents.