When Helping Hurts and How to Fix It: Multi-Agent Debate for Data Cleaning

arXiv cs.AI·Chirag Parmar, Akshat Mehta, Henglin Wu, Jagadish Ramamurthy, Shweta Medhekar

4h ago

·~1 min·6/3/2026·en·0

Quick Take

Multi-agent debate can enhance data cleaning by improving error detection (+27.4pp F1) but may degrade generation (-1.6 to -15.5pp) due to critique-induced confusion. A successful configuration involves adversarial separation, leading to a 5.3pp improvement over single-agent tasks.

Key Points

Debate's effect can reverse, degrading generation while improving error detection.
Critique-induced confusion leads to significant performance drops across four model families.
A factorial experiment confirms the necessity of adversarial separation for success.
The new configuration outperforms single-agent tasks with a 5.3pp improvement.
Condition for success: rescuing wrong outputs must outweigh destroying correct ones.

Article Excerpt

From source RSS / original summary

arXiv:2606. 02866v1 Announce Type: new Abstract: When does multi-agent debate help data cleaning, and when does it hurt? Across three benchmarks, four model families, and over 6,000 task-condition pairs, we find debate's effect reverses sign: it degrades generation across all four models (-1. 6 to -15. 5pp) through critique-induced confusion (CIC), hallucinated Critic feedback that the Generator accepts uncritically, yet improves error detection (+27. 4pp F1, d=1. 0).

We derive a debate benefit condition: debate helps when the probability of rescuing a wrong output (Critic verification odds weighted by fixability) exceeds the probability of destroying a correct one. A factorial experiment proves adversarial separation is essential: self-verification with identical tools fails, while a separate Critic with code-execution grounding and evidence-gated generation produces the first debate configuration to significantly exceed single-agent on a generative task (+5. 3pp, p<0. 05).

The condition correctly predicts all nine task types and generalizes with zero false positives across 19 published comparisons in seven domains.

Reader Mode unavailable (could not extract clean content).

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.AI

See more →

arXiv cs.AI·Yan Wang, Xuguang Ai, Jaisal Patel, Xueqing Peng, Fengran Mo, Yupeng Cao, Haohang Li, Mingyu Cao, Lingfei Qian, V\'ictor Guti\'errez-Basulto

4h ago

FeaturedOriginal

AUDITFLOW: Executable Symbolic Environments for Structured Financial Reporting Verification

AI Summary

AuditFlow introduces a multi-agent framework for structured financial reporting verification, achieving 82.09% accuracy with GPT-5.5, outperforming the baseline by 14.93 points. It utilizes a symbolic environment for effective audit processes, demonstrating the necessity of deterministic checks for reliable verification.

#Agent #AI Coding #Inference #Enterprise AI