Truth or Sophistry? LoFa: A Benchmark for LLM Robustness Against Logical Fallacies
Quick Answer
LoFa introduces a benchmark for assessing LLM robustness against logical fallacies, revealing varying vulnerability profiles among models.
Quick Take
LoFa introduces a benchmark for assessing LLM robustness against logical fallacies, revealing varying vulnerability profiles among models. The proposed metric, LFR@k, quantifies resistance to fallacious arguments, highlighting the need for improved resilience in LLMs.
Key Points
- LoFa benchmarks LLM resilience against logical fallacies through a pipeline.
- The framework includes a multi-round debate to test model robustness under persuasion.
- LFR@k metric quantifies logical fallacy resistance, addressing knowledge limitations.
- Experiments show LLMs have varying robustness across different types of fallacies.
- Distinct vulnerability profiles among models indicate specific areas for improvement.
Paper Resources
Article Excerpt
From source RSS / original summaryarXiv:2606. 31039v1 Announce Type: new Abstract: Large Language Models (LLMs) exhibit strong semantic capabilities, yet their resilience to manipulative linguistic patterns such as logical fallacies remains underexplored. Prior work has primarily examined whether LLMs can identify or classify fallacies, leaving their robustness against fallacious persuasion insufficiently studied. To address this gap, we introduce LoFa (Logical Fallacy), a comprehensive benchmark for evaluating LLM robustness against fallacies.
LoFa is constructed through a pipeline that pairs factual questions with fallacious arguments, and is accompanied by a multi-round debate framework for assessing model resilience under sustained adversarial persuasion. To disentangle fallacy robustness from a model's inherent knowledge limitations, we further propose Logical Fallacy Resistance at k (LFR@k), a metric that quantifies resistance to fallacious attacks.
Experiments show that LLMs exhibit varying levels of robustness across different fallacy types, revealing distinct vulnerability profiles among models.
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CL
See more →Quantifying Prior Dominance in Systems
The study introduces the Normalized Context Utilization (NCU) metric to evaluate Retrieval-Augmented Generation (RAG) systems, revealing that Small Language Models (SLMs) outperform larger models in factual extraction. The findings indicate that traditional scaling laws yield diminishing returns, with a commercial API frequently failing against adversarial evidence due to systemic confidence collapse.