Probing Minimalist Phase Structure in LLMs: What Universal Dependencies Cannot Represent
Quick Take
This study reveals that large language models (LLMs) encode formal-syntactic abstractions like phase boundaries, which Universal Dependencies (UD) cannot capture. Testing 13 LLMs, the research found a phase-count gradient in 12 models and confirmed that these representations are causally active, suggesting that pretraining can align with deeper syntactic structures beyond UD's limitations.
Key Points
- Structural probes on wh-movement stimuli reveal LLMs' phase boundary encoding.
- 13 LLMs showed a phase-count gradient in 12 models across tested conditions.
- Findings suggest pretraining induces representations beyond UD's annotation limits.
- Causal activation confirmed in 12 out of 13 models tested.
- UD-based probing provides a lower bound on syntactic encoding.
Article Content
From source RSS / original summaryarXiv:2605. 26431v1 Announce Type: new Abstract: Structural probes train on Universal Dependencies (UD), which does not encode formal-syntactic abstractions such as phase boundaries or phase-internal cohesion. Whether large language models (LLMs) encode these remains an open question that UD-based probing cannot answer by construction. We evaluate structural probes on wh-movement stimuli where UD distances are invariant across conditions by design -- any non-zero effect therefore reflects structure beyond UD.
The three conditions -- bare small clause, infinitival, and finite -- are ordered by the number of Minimalist Program (MP) phase boundaries the wh-element crosses. Across 13 LLMs from four families, we find a phase-count gradient on a cross-clause pair (12/13 models) and a 13/13 sign asymmetry on a within-clause pair whose UD distance is identical across conditions -- the latter specifically predicted by phase-internal cohesion, an MP abstraction invisible to UD by construction.
Activation patching confirms the representations are causally active in 12/13 models. These findings suggest that distributional pretraining can induce representations aligned with formal-syntactic abstractions beyond the reach of annotation-based probing; UD-grounded probes provide a lower bound on syntactic encoding, not an upper bound.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CL
See more →Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?
The reliability of LLM judges for evaluating deep research agents is critically assessed using the REFLECT benchmark.