Language Acquisition Device in Large Language Models

arXiv cs.CL·Masato Mita, Taiga Someya, Ryo Yoshida, Yohei Oseki

1d ago

·~2 min·5/19/2026·en·2

Quick Take

LAD-inspired pre-pretraining enhances LLM efficiency using MP-STRUCT formal language.

Key Points

Pre-pretraining on synthetic languages improves data efficiency.
MP-STRUCT encodes hierarchical composition and dependencies.
Functional landmarks enhance dependency resolution in LLMs.

📖 Reader Mode

~2 min read

[Submitted on 16 May 2026]

View PDF HTML (experimental)

Abstract:Large Language Models (LLMs) remain substantially less data-efficient than humans. Pre-pretraining (PPT) on synthetic languages has been proposed to close this gap, with prior work emphasizing highly expressive formal languages such as $k$-Shuffle Dyck. Inspired by the Language Acquisition Device (LAD) hypothesis, which posits that innate constraints preemptively restrict the learner's hypothesis space to natural-language-like structure, we propose LAD-inspired PPT: pre-pretraining on MP-STRUCT, a formal language whose strings encode hierarchical composition, feature-based dependencies, and long-distance displacement via MERGE, AGREE, and MOVE. A brief 500-step PPT with MP-STRUCT matches strong formal-language baselines in token efficiency while additionally imparting a human-like resistance to structurally implausible languages (e.g., REVERSE). Analyzing simplified variants, we find that MP-STRUCT CORE outperforms $k$-Shuffle Dyck despite not being definable in C-RASP (a formal bound on transformer expressivity), challenging the prior hypothesis that effective PPT languages must be both hierarchically expressive and circuit-theoretically learnable. We show that functional landmarks, which reduce dependency resolution ambiguity, are a key driver, suggesting that effective PPT design depends not only on expressivity but also on the accessibility of dependency resolution.

Comments:	Accepted to ACL2026 Main Conference
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2605.16758 [cs.CL]
	(or arXiv:2605.16758v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2605.16758 arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Masato Mita [view email]
[v1] Sat, 16 May 2026 02:13:32 UTC (215 KB)

— Originally published at arxiv.org

Continue reading on arxiv.org

Language Acquisition Device in Large Language Models

Quick Take

Key Points

📖 Reader Mode

Submission history

More from arXiv cs.CL

Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?

Diagnosing Multi-step Reasoning Failures in Black-box LLMs via Stepwise Confidence Attribution

MMoA: An AI-Agent framework with recurrence for Memoried Mixure-of-Agent

Related in this space

From Prompts to Protocols: An AI Agent for Laboratory Automation

Agentic Trading: When LLM Agents Meet Financial Markets