Story Operators: Decomposing the Original $\to$ Sequel Transformation in Embedding Space

arXiv cs.CL·W. Frederick Zimmerman

6d ago

·~2 min·6/25/2026·en·0

Quick Answer

This study analyzes the geometric transformation from original novels to their sequels using all-mpnet-base-v2 embeddings, revealing a taxonomy of sequels based on PCA decomposition.

Quick Take

This study analyzes the geometric transformation from original novels to their sequels using all-mpnet-base-v2 embeddings, revealing a taxonomy of sequels based on PCA decomposition. The findings include types such as formulaic, concentrated, and compositional, with specific examples from Project Gutenberg, including the structural shift in Twain's 'Tom Sawyer' to 'Huckleberry Finn'.

Key Points

Utilizes all-mpnet-base-v2 embeddings from the PG19 corpus for analysis.
Identifies three sequel types: formulaic, concentrated, and compositional.
Highlights the structural shift in Twain's works as a dominant transformation axis.
Findings are reproducible with released scripts and data.
Cites Twain's letters to support the analysis of authorial intent.

Paper Resources

Read Paperarxiv.org View PDFarxiv.org

📖 Reader Mode

~2 min read

[Submitted on 24 Jun 2026]

View PDF HTML (experimental)

Abstract:I treat a book as a point in a sentence-embedding space and a literary transformation as an operation on points. Given an original novel and its sequel, I ask what it takes, geometrically, to turn the first into the second. Using all-mpnet-base-v2 paragraph embeddings drawn from a precomputed index of the PG19 corpus, I form the displacement $d=\bar{x}_{\rm seq}-\bar{x}_{\rm orig}$ and greedily decompose it along a content basis obtained by PCA over the two books' own paragraphs. Each component is an interpretable axis anchored by real passages at its poles. Across thirteen verified author pairs from Project Gutenberg, the decomposition reveals a small taxonomy of sequels: formulaic (a tiny, low-rank change: Doyle's Holmes collections, $\|d\|=0.12$), concentrated (one dominant axis: Alcott's Little Women $\to$ Little Men, 75% on a single move), and compositional (many small axes: Twain, Burroughs's Barsoom, Nesbit). For the canonical case, Tom Sawyer $\to$ Huckleberry Finn, the dominant recovered axis is structural -- the collapse of sheltering domesticity into a picaresque road -- rather than the famous surface themes of vernacular voice or slavery, which ride later, smaller axes; and the transformation routes through adventure-journey space rather than diluting toward generic realism. I corroborate the recovered geometry against Twain's documented authorial intent (his 1875--76 letters to Howells), which names the first-person picaresque move years in advance, and I quantify, with an explicit representation caveat, how much of the realized transformation his stated intentions span. All computations are reproducible from the released scripts and data.

Comments:	8 pages, 3 figures
Subjects:	Computation and Language (cs.CL)
ACM classes:	I.2.7
Cite as:	arXiv:2606.25379 [cs.CL]
	(or arXiv:2606.25379v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2606.25379 arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Frederick Zimmerman [view email]
[v1] Wed, 24 Jun 2026 04:21:36 UTC (793 KB)

— Originally published at arxiv.org

Continue reading on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.CL

See more →

arXiv cs.CL·Barak Or

1w ago

FeaturedOriginal

Quantifying Prior Dominance in Systems

AI Summary

The study introduces the Normalized Context Utilization (NCU) metric to evaluate Retrieval-Augmented Generation (RAG) systems, revealing that Small Language Models (SLMs) outperform larger models in factual extraction. The findings indicate that traditional scaling laws yield diminishing returns, with a commercial API frequently failing against adversarial evidence due to systemic confidence collapse.

#LLM #AI Coding #Inference #AI Startup

Story Operators: Decomposing the Original $\to$ Sequel Transformation in Embedding Space

Quick Answer

Quick Take

Key Points

Paper Resources

📖 Reader Mode

Submission history

Want this in your inbox every morning?

More from arXiv cs.CL

Quantifying Prior Dominance in Systems

Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?

When Plausible Is Not Realistic: Evaluating Human Mobility in LLM-Based Urban Simulation

Quick Answer

Quick Take

Key Points

Paper Resources

📖 Reader Mode

Submission history

Want this in your inbox every morning?

More from arXiv cs.CL

Quantifying Prior Dominance in RAG Systems

Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?

When Plausible Is Not Realistic: Evaluating Human Mobility in LLM-Based Urban Simulation

Quantifying Prior Dominance in Systems