Observable Patterns Are Not Explanations: A Causal-Geometric Analysis of Latent Reasoning Models
Quick Answer
This paper shows that A causal-geometric analysis of latent reasoning models (Coconut and CODI) reveals that observable patterns do not equate to explanations of internal reasoning mechanisms.
Quick Take
A causal-geometric analysis of latent reasoning models (Coconut and CODI) reveals that observable patterns do not equate to explanations of internal reasoning mechanisms. Instead, latent thoughts should be viewed as hidden computations, necessitating matched controls and causal tests for interpretability.
Key Points
- Coconut and CODI show observable patterns similar to controls lacking proposed recurrence.
- Latent-thought utilization is graded, not binary, affecting model behavior variably.
- Causal interventions reveal that decodability alone does not establish reasoning mechanisms.
- Geometric analyses indicate structured growth in low-rank directions correlating with behavioral influence.
- Interpretability of LRMs requires rigorous causal testing and matched controls.
Paper Resources
Article Excerpt
From source RSS / original summaryarXiv:2606. 12689v1 Announce Type: new Abstract: Latent reasoning models (LRMs) replace explicit chain-of-thought with continuous thoughts. Recent work treats observable latent-state patterns, such as BFS-like frontiers and decodable arithmetic computation, as evidence for internal reasoning mechanisms. Evaluating two LRMs (Coconut and CODI) against controls lacking the proposed recurrence or curriculum, we find these patterns also appear in the controls and do not always causally affect behavior.
Causal interventions reveal that latent-thought utilization is not binary but graded, scaling with a thought's causal effect on model behavior. Geometric analyses reveal this effect concentrates in low-rank directions whose step-to-step geometry grows more structured as their behavioral influence increases. Latent thoughts should therefore be treated as hidden computation, not hidden explanation: decodability, attention, or static structure alone cannot establish mechanism.
LRM interpretability thus requires matched controls and causal tests.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CL
See more →Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?
The REFLECT benchmark reveals that current LLM judges are unreliable, achieving below 55% accuracy in evaluating reasoning and evidence use, highlighting the need for improved evaluation methods for deep research agents.