Pruning via Causal Attribution Preserves Reasoning Performance in… | AI Deep Signal

Pruning via Causal Attribution Preserves Reasoning Performance in Large Language Models

arXiv cs.CL·Amogh Sheth, Biruk Assefa, Yi Wen Huang, Andrew Lin, Yuhao Ge

6/19/2026

·~2 min·6/19/2026·en·4

Quick Answer

This paper shows that Causal Attribution Pruning (CAP) enhances reasoning performance in large language models like Llama-3 and Mistral-7B, achieving up to 61% accuracy gains over Wanda on ARC-Challenge at 20% sparsity.

Quick Take

CAP identifies critical attention heads based on their causal impact, outperforming traditional pruning methods in preserving performance while reducing inference costs.

Key Points

CAP estimates performance degradation by masking attention heads during reasoning tasks.
Achieved relative accuracy gains of up to 61% on ARC-Challenge at 20% sparsity.
Evaluated on GSM8K, StrategyQA, and ARC-Challenge with Llama-3 and Mistral-7B.
CAP outperforms magnitude-only and activation-based pruning methods.
Performance improvements are especially notable at moderate sparsity levels (10-20%).

Paper Resources

Read Paperarxiv.org View PDFarxiv.org

Source Excerpt

(LLMs) excel at multi-step reasoning but incur substantial inference cost. We introduce Causal Attribution Pruning (CAP), a training-free method that identifies critical attention heads by measuring their causal impact on reasoning tasks and uses these head-level scores to guide fine-grained weight pruning. For each attention head, CAP estimates the expected performance degradation when the head is masked during forward passes on a small calibration set of reasoning problem

Read the full article on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.CL

See more →

arXiv cs.CL·Isabel Xu (The Overlake School), Cynthia Xu (The Overlake School), Rachel Ren (Edwards Vacuum Inc.), Cong Guo (The University of Memphis), Jiacheng Ding (The University of Memphis)

1w ago

FeaturedOriginal

TriAgent: Divergence-Aware Committees for Cost-Efficient Financial Sentiment Analysis

AI Summary

TriAgent introduces a cost-efficient multi-agent system for financial sentiment analysis, combining VADER, FinBERT, and Qwen2.5. It achieves an F1 score of ~0.87 with significant savings of $9.3M/year at a 10M-user scale compared to GPT-4o-mini, while also detecting hallucinations with an AUC of 0.90.

#LLM #Agent #AI Startup #Enterprise AI

Pruning via Causal Attribution Preserves Reasoning Performance in Large Language Models

Quick Answer

Quick Take

Key Points

Paper Resources

Source Excerpt

Want this in your inbox every morning?

More from arXiv cs.CL

TriAgent: Divergence-Aware Committees for Cost-Efficient Financial Sentiment Analysis

RF-Agent: A Practical Framework for Building Language Agents for RFIC Design

Letting the Data Speak: Extracting Keywords from Crowdsourced Collections with AI

Quick Answer

Quick Take

Key Points

Paper Resources

Source Excerpt

Want this in your inbox every morning?

More from arXiv cs.CL

TriAgent: Divergence-Aware Multi-Agent Committees for Cost-Efficient Financial Sentiment Analysis

RF-Agent: A Practical Framework for Building Language Agents for RFIC Design

Letting the Data Speak: Extracting Keywords from Crowdsourced Collections with AI

TriAgent: Divergence-Aware Committees for Cost-Efficient Financial Sentiment Analysis