Sentence-Level Contextual Entrainment in Large Language Models
Quick Answer
This study reveals that large language models (LLMs) exhibit sentence-level contextual entrainment, where sentences in prompts can significantly boost token probabilities during inference.
Quick Take
This study reveals that large language models (LLMs) exhibit sentence-level contextual entrainment, where sentences in prompts can significantly boost token probabilities during inference. Analyzing 26 LLMs, it was found that this effect diminishes with model size and can be mitigated by disabling 2-4% of attention heads without degrading performance.
Key Points
- Sentence-level contextual entrainment boosts token probabilities during model inference.
- Study analyzed 26 LLMs from seven families across subjective and objective tasks.
- Larger models show a gradual decrease in contextual entrainment effects.
- 2-4% of attention heads control contextual entrainment and can be disabled effectively.
- Disabling attention heads does not harm overall model performance.
Paper Resources
Article Excerpt
From source RSS / original summaryarXiv:2606. 24077v1 Announce Type: new Abstract: Contextual entrainment, which is a newly discovered phenomenon in large language models (LLMs), refers to the tendency of a model to assign higher probabilities to tokens that appear in its context. In this work, we extend this phenomenon from the token level to the sentence level by examining the per-token mean log-probability of a sentence instead of the probabilities of individual tokens.
We investigate sentence-level contextual entrainment across 26 LLMs from seven families and two datasets, which cover both subjective and objective tasks. We find that sentence-level contextual entrainment exists. This means that the sentences in the prompt (even if they are counterfactual statements) can significantly increase their probability during model inference time. As the model size increases, contextual entrainment gradually decreases.
We also find that contextual entrainment is controlled by 2% to 4% of the attention heads. Turning off these attention heads can effectively mitigate contextual entrainment without hurting the model's performance.
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CL
See more →Quantifying Prior Dominance in Systems
The study introduces the Normalized Context Utilization (NCU) metric to evaluate Retrieval-Augmented Generation (RAG) systems, revealing that Small Language Models (SLMs) outperform larger models in factual extraction. The findings indicate that traditional scaling laws yield diminishing returns, with a commercial API frequently failing against adversarial evidence due to systemic confidence collapse.