Localizing Anchoring Pathways in Language Models

arXiv cs.CL·Hillary N. Owusu, Sarah Wiegreffe, Naomi H. Feldman

6/12/2026

·~1 min·6/12/2026·en·4

Quick Answer

This study reveals that irrelevant numbers in prompts can influence language model judgments, specifically in numerical reasoning, by analyzing anchoring effects in models like Qwen and Llama.

Quick Take

Using logit-difference metrics and circuit localization, it finds that edge-level methods better capture anchoring signals, indicating shared pathways within models but inconsistent transfer between base and instruction-tuned variants.

Key Points

Irrelevant numbers in prompts create anchoring effects in language model judgments.
Logit-difference metrics validate behavioral anchoring in models like Qwen and Llama.
Edge-level methods recover anchoring signals more accurately than node-level methods.
Low- and high-anchor circuits show strong transfer within models, indicating shared pathways.
Sparse transfer across base and instruction-tuned variants suggests pathway changes post-training.

Paper Resources

Read Paperarxiv.org View PDFarxiv.org

Source Excerpt

arXiv:2606. 12818v1 Announce Type: new Abstract: Irrelevant numbers in a prompt can shift language model judgments, producing anchoring effects in numerical reasoning. We study where this anchor-sensitive signal is carried inside language models using a controlled multiple-choice setup with shared answer options. We define a logit-difference metric comparing the correct answer option with the answer option corresponding to the anchor, and validate that it tracks behavioral anchoring.

Using attribution-based circuit localization on 7B--8B Qwen and Llama base and instruction-tuned models, we find that edge-level methods recover this signal more faithfully than node-level methods. …

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.CL

See more →

arXiv cs.CL·Isabel Xu (The Overlake School), Cynthia Xu (The Overlake School), Rachel Ren (Edwards Vacuum Inc.), Cong Guo (The University of Memphis), Jiacheng Ding (The University of Memphis)

5d ago

FeaturedOriginal

TriAgent: Divergence-Aware Committees for Cost-Efficient Financial Sentiment Analysis

AI Summary

TriAgent introduces a cost-efficient multi-agent system for financial sentiment analysis, combining VADER, FinBERT, and Qwen2.5. It achieves an F1 score of ~0.87 with significant savings of $9.3M/year at a 10M-user scale compared to GPT-4o-mini, while also detecting hallucinations with an AUC of 0.90.

#LLM #Agent #AI Startup #Enterprise AI

Localizing Anchoring Pathways in Language Models

Quick Answer

Quick Take

Key Points

Paper Resources

Source Excerpt

Want this in your inbox every morning?

More from arXiv cs.CL

TriAgent: Divergence-Aware Committees for Cost-Efficient Financial Sentiment Analysis

RF-Agent: A Practical Framework for Building Language Agents for RFIC Design

Letting the Data Speak: Extracting Keywords from Crowdsourced Collections with AI

Quick Answer

Quick Take

Key Points

Paper Resources

Source Excerpt

Want this in your inbox every morning?

More from arXiv cs.CL

TriAgent: Divergence-Aware Multi-Agent Committees for Cost-Efficient Financial Sentiment Analysis

RF-Agent: A Practical Framework for Building Language Agents for RFIC Design

Letting the Data Speak: Extracting Keywords from Crowdsourced Collections with AI

TriAgent: Divergence-Aware Committees for Cost-Efficient Financial Sentiment Analysis