Memorization Dynamics of Fill-in-the-Middle Pretraining

arXiv cs.CL·Tobias von Arx, Tanguy Dieudonn\'e

5/25/2026

·~1 min·5/25/2026·en·3

Quick Answer

Quick Take

The study examines the memorization dynamics of Fill-in-the-Middle (FIM) pretraining in Llama 3.2 models, revealing that FIM enhances recovery of short spans while LTR excels in long exact continuations. Verbatim recall under FIM grows linearly with repetitions, emphasizing the importance of prefix context over suffix context.

Key Points

FIM pretraining enhances infilling ability in Llama 3.2 models.
FIM recovers short spans more effectively than standard left-to-right (LTR) training.
Verbatim recall under FIM training increases linearly with repetitions.
Prefix context is crucial for verbatim recall in FIM training.
Evaluating only one span length can overlook key memorization nuances.

Paper Resources

Read Paperarxiv.org View PDFarxiv.org

Article Excerpt

From source RSS / original summary

arXiv:2605. 22981v1 Announce Type: new Abstract: Fill-in-the-middle (FIM) is a pretraining objective widely used to equip causal language models with infilling ability, yet its effect on verbatim memorization remains underexplored. We study the memorization dynamics of FIM in a controlled setting by pretraining matched Llama 3. 2 models with FIM and standard left-to-right (LTR) objectives on a FineWeb-Gutenberg corpus containing repeated Gutenberg excerpts.

With prefix-based probes, FIM more often recovers short or partially matching spans, while LTR more often assigns high confidence to long exact continuations. We observe that verbatim extraction under FIM-training grows approximately linearly with repetitions over the tested range. Evaluating native FIM-format probes reveals that suffix context is not sufficient: verbatim recall under FIM-training remains strongly anchored in prefix context.

Our results also show that evaluating only one span length or probing format can miss important nuances in memorization behavior.

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.CL

See more →

arXiv cs.CL·Barak Or

2w ago

FeaturedOriginal

Quantifying Prior Dominance in Systems

AI Summary

The study introduces the Normalized Context Utilization (NCU) metric to evaluate Retrieval-Augmented Generation (RAG) systems, revealing that Small Language Models (SLMs) outperform larger models in factual extraction. The findings indicate that traditional scaling laws yield diminishing returns, with a commercial API frequently failing against adversarial evidence due to systemic confidence collapse.

#LLM #AI Coding #Inference #AI Startup

Memorization Dynamics of Fill-in-the-Middle Pretraining

Quick Answer

Quick Take

Key Points

Paper Resources

Article Excerpt

Want this in your inbox every morning?

More from arXiv cs.CL

Quantifying Prior Dominance in Systems

Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?

When Plausible Is Not Realistic: Evaluating Human Mobility in LLM-Based Urban Simulation

Quick Answer

Quick Take

Key Points

Paper Resources

Article Excerpt

Want this in your inbox every morning?

More from arXiv cs.CL

Quantifying Prior Dominance in RAG Systems

Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?

When Plausible Is Not Realistic: Evaluating Human Mobility in LLM-Based Urban Simulation

Quantifying Prior Dominance in Systems