When transformers learn "impossible" languages, what do they learn?

arXiv cs.CL·Ram Janarthan, Coleman Haley, Sharon Goldwater

12h ago

·~1 min·7/1/2026·en·0

Quick Answer

This paper shows that Recent research on GPT-2 models reveals that while they show gradual degradation in grammatical sensitivity to 'impossible' languages, they significantly struggle with generative tasks, producing fewer high-quality sentences as length increases.

Quick Take

Recent research on GPT-2 models reveals that while they show gradual degradation in grammatical sensitivity to 'impossible' languages, they significantly struggle with generative tasks, producing fewer high-quality sentences as length increases. This suggests a link between model behavior and the non-attestation of such languages due to generative deficiencies.

Key Points

GPT-2 models trained on 'impossible' English variants show gradual grammatical sensitivity degradation.
Model performance declines are influenced by the language's information locality.
Significant failures in generative tasks lead to fewer high-quality sentences at longer lengths.
Results suggest generative deficiencies may explain non-attestation of impossible languages.

Paper Resources

Read Paperarxiv.org View PDFarxiv.org

Article Content

From source RSS / original summary

arXiv:2606. 30815v1 Announce Type: new Abstract: Recent work suggests that transformer language models show a bias towards human languages over unnatural ("impossible") languages argued to be unacquirable by humans. However, this literature has largely based these claims on differences in sample efficiency and test-set perplexity, rather than on direct evaluations of the linguistic capacities that could plausibly explain non-attestation in human languages.

We evaluate two theoretically motivated linking hypotheses: impossibility arising from deficiencies in grammatical sensitivity or generative production. Using GPT-2 style models trained on perturbed "impossible" variants of English, we measure sensitivity to grammaticality using BLiMP minimal pairs, finding that model performance exhibits only gradual degradation, mediated by the language's information locality.

In contrast, these models exhibited pronounced failures in generation, producing substantially fewer high-quality sentences at longer lengths. Together, these results suggest generative deficiency and transmission failures as a plausible linking hypothesis between language model behaviour and non-attestation of impossible languages.

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.CL

See more →

arXiv cs.CL·Barak Or

1w ago

FeaturedOriginal

Quantifying Prior Dominance in Systems

AI Summary

The study introduces the Normalized Context Utilization (NCU) metric to evaluate Retrieval-Augmented Generation (RAG) systems, revealing that Small Language Models (SLMs) outperform larger models in factual extraction. The findings indicate that traditional scaling laws yield diminishing returns, with a commercial API frequently failing against adversarial evidence due to systemic confidence collapse.

#LLM #AI Coding #Inference #AI Startup

When transformers learn "impossible" languages, what do they learn?

Quick Answer

Quick Take

Key Points

Paper Resources

Article Content

Want this in your inbox every morning?

More from arXiv cs.CL

Quantifying Prior Dominance in Systems

Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?

When Plausible Is Not Realistic: Evaluating Human Mobility in LLM-Based Urban Simulation

Quick Answer

Quick Take

Key Points

Paper Resources

Article Content

Want this in your inbox every morning?

More from arXiv cs.CL

Quantifying Prior Dominance in RAG Systems

Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?

When Plausible Is Not Realistic: Evaluating Human Mobility in LLM-Based Urban Simulation

Quantifying Prior Dominance in Systems