The Cognitive Categorical Transformer: Category-Theoretic Inductive Biases for Language Modeling

5/29/2026

·~2 min·5/29/2026·en·4

Quick Answer

This paper shows that The Cognitive Categorical Transformer (CCT) achieves a 21.27 perplexity on WikiText-103, outperforming a fine-tuned GPT-2 Small baseline by 2.92 PPL.

Quick Take

The Cognitive Categorical Transformer (CCT) achieves a 21.27 perplexity on WikiText-103, outperforming a fine-tuned GPT-2 Small baseline by 2.92 PPL. This 306M-parameter model integrates category-theoretic components, demonstrating that simplicial message passing enhances language modeling effectiveness. Negative results on certain categorical priors suggest a structure/consistency distinction in model performance.

Key Points

CCT reduces perplexity by 12% compared to fine-tuned GPT-2 Small.
Simplicial message passing contributes significantly to language modeling improvements.
Negative results indicate that some categorical priors do not enhance performance.
CCT architecture consists of 306 million parameters.
The study provides first evidence of simplicial message passing's effectiveness at this scale.

Paper Resources

Read Paperarxiv.org View PDFarxiv.org

Article Content

From source RSS / original summary

arXiv:2605. 28864v1 Announce Type: new Abstract: The Cognitive Categorical Transformer (CCT) is a 306M-parameter architecture that augments a pretrained GPT-2 Small backbone with cognitively grounded components derived from category theory and several inspirations from cognitive science. Under a matched-step protocol (215,000 optimizer steps, matched data, matched optimizer and schedule) on WikiText-103, CCT reaches 21. 27 validation perplexity, compared with 24.

19 for an identically fine-tuned GPT-2 Small baseline. The architecture therefore contributes a 2. 92 PPL (12% relative) reduction beyond what in-domain fine-tuning alone provides. A retrain-from-scratch ablation that holds GT-Full simplicial message passing bypassed across the entire seven-phase activation schedule reaches 23. 72 PPL, localizing 84% of the architectural improvement (2. 45 of 2. 92 PPL) to GT-Full.

We present the first ablation-validated evidence that simplicial message passing improves language-model perplexity at the 306M-parameter scale on WikiText-103. Published GPT-2 Large reaches 22. 05 zero-shot PPL on WikiText-103 with 6. 2x more parameters than GPT-2 Small; this paper treats that number as an external published reference, not as the architectural benchmark.

Three negative results on consistency-style categorical priors (sheaf smoothing, adjunction round-trip, curvature regularization) and the joint structural-prior result for GT-Full and PrecisionWeightedPP together support an empirical pattern termed the *structure/consistency distinction*, in which categorical priors that add new topology improve language modeling and those that enforce a consistency identity do not.

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.AI

See more →

arXiv cs.AI·Mihnea C. Moldoveanu, Joel A. C. Baum

4d ago

FeaturedOriginal

Adversarial Social Epistemology for Assemblies of Humans and Large Language Models

AI Summary

The paper introduces Adversarial Social Epistemology (ASE) to analyze how agents manipulate trust in public communications, highlighting mechanisms that undermine the reliability of testimony and inference. It critiques existing frameworks like epistemic bubbles and misinformation diffusion, proposing a new language for understanding trust breaches and auditing inferential chains in densely interactive environments involving humans and large language models.

#LLM #Agent #Inference #Policy

The Cognitive Categorical Transformer: Category-Theoretic Inductive Biases for Language Modeling

Quick Answer

Quick Take

Key Points

Paper Resources

Article Content

Want this in your inbox every morning?

More from arXiv cs.AI

Adversarial Social Epistemology for Assemblies of Humans and Large Language Models

Information Limits and Attractor Dynamics in Economies of Frontier LLM Agents: A Pre-Registered Test

Onnes: A Physics-Grounded LLM Simulator for Cryogenic Fault Diagnosis in Quantum Computing Infrastructure

Quick Answer

Quick Take

Key Points

Paper Resources

Article Content

Want this in your inbox every morning?

More from arXiv cs.AI

Adversarial Social Epistemology for Assemblies of Humans and Large Language Models

Information Limits and Attractor Dynamics in Economies of Frontier LLM Agents: A Pre-Registered Test

Onnes: A Physics-Grounded Multi-Agent LLM Simulator for Cryogenic Fault Diagnosis in Quantum Computing Infrastructure

Onnes: A Physics-Grounded LLM Simulator for Cryogenic Fault Diagnosis in Quantum Computing Infrastructure