Legal Domain Adaptation of Modern BERT Models

arXiv cs.CL·Dominik Stammbach, Peter Henderson

1d ago

·~2 min·6/30/2026·en·0

Quick Answer

The study demonstrates that further pre-training of ModernBERT on US court opinions significantly enhances its performance in the legal domain, achieving notable improvements over vanilla ModernBERT.

Quick Take

The study demonstrates that further pre-training of ModernBERT on US court opinions significantly enhances its performance in the legal domain, achieving notable improvements over vanilla ModernBERT. The adapted models can process sequences of up to 8,192 tokens and effectively rank legal passages for search queries, with all model checkpoints made publicly available.

Key Points

ModernBERT pre-trained on US court opinions shows significant performance gains.
Models can process sequences of up to 8,192 tokens for legal text.
Further pre-training outperforms training from scratch in legal tasks.
All model checkpoints are publicly released for further research.
Improvements align with earlier findings on BERT domain adaptation.

Paper Resources

Read Paperarxiv.org View PDFarxiv.org

📖 Reader Mode

~2 min read

[Submitted on 26 Jun 2026]

View PDF

Abstract:We investigate domain adaptation of modern BERT models in the legal domain. We further pre-train ModernBERT on all US court opinions using the masked language modeling objective. Although ModernBERT has been trained on roughly 500x more data than original BERT, we still find that this model benefits from further pre-training and domain adaptation in the legal domain: we report significant improvements compared to vanilla ModernBERT on all datasets connected to US court opinions. We find gains similar to those reported in early work on domain adaptation of BERT-like models. However, from scratch pre-training does not match the performance of further pre-training an existing ModernBERT checkpoint in our experiments. The resulting models are capable of processing sequences up to 8,192 tokens, and can be used to compute meaningful embeddings of legal passages, or could quickly rerank hundreds of legal passages for a given search query. We release all model checkpoints publicly.

Comments:	To appear in Proceedings of the 21st International Conference on Artificial Intelligence and Law (ICAIL 2026), June 9-12, 2026, Singapore
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2606.28538 [cs.CL]
	(or arXiv:2606.28538v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2606.28538 arXiv-issued DOI via DataCite

Submission history

From: Dominik Stammbach [view email]
[v1] Fri, 26 Jun 2026 18:44:11 UTC (186 KB)

— Originally published at arxiv.org

Continue reading on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.CL

See more →

arXiv cs.CL·Barak Or

1w ago

FeaturedOriginal

Quantifying Prior Dominance in Systems

AI Summary

The study introduces the Normalized Context Utilization (NCU) metric to evaluate Retrieval-Augmented Generation (RAG) systems, revealing that Small Language Models (SLMs) outperform larger models in factual extraction. The findings indicate that traditional scaling laws yield diminishing returns, with a commercial API frequently failing against adversarial evidence due to systemic confidence collapse.

#LLM #AI Coding #Inference #AI Startup

Legal Domain Adaptation of Modern BERT Models

Quick Answer

Quick Take

Key Points

Paper Resources

📖 Reader Mode

Submission history

Want this in your inbox every morning?

More from arXiv cs.CL

Quantifying Prior Dominance in Systems

Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?

When Plausible Is Not Realistic: Evaluating Human Mobility in LLM-Based Urban Simulation

Quick Answer

Quick Take

Key Points

Paper Resources

📖 Reader Mode

Submission history

Want this in your inbox every morning?

More from arXiv cs.CL

Quantifying Prior Dominance in RAG Systems

Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?

When Plausible Is Not Realistic: Evaluating Human Mobility in LLM-Based Urban Simulation

Quantifying Prior Dominance in Systems