A Fine-Tuned BERT Classifier for Personal-Letter Titles in Late-Ming and Early-Qing Collected Works

arXiv cs.CL·Queenie Luo

3h ago

·~1 min·5/25/2026·en·0

Quick Take

Lepton is a fine-tuned BERT classifier for identifying personal letters in Classical Chinese texts.

Key Points

Trained on 5438 labeled titles from late-Ming and early-Qing.
Deployed on Hugging Face for public access.
Identifies 55,000 letters in historical wenji collections.

Article Excerpt

From source RSS / original summary

arXiv:2605. 23103v1 Announce Type: new Abstract: I present Lepton (Letter Prediction), a fine-tuned BERT classifier that predicts whether a title in a Classical Chinese wenji table of contents is a personal letter or a closely confusable preface (particularly the farewell-preface). Lepton fine-tunes bert-base-chinese on 5438 hand-labeled wenji titles from thirty-three late-Ming and early-Qing literati.

I've deployed the model on Hugging Face and has been used at the China Biographical Database (CBDB) to identify approximately fifty-five thousand letters across mid-Ming through early-Qing wenji, populating the Ming Letter Platform.

Reader Mode unavailable (could not extract clean content).

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

A Fine-Tuned BERT Classifier for Personal-Letter Titles in Late-Ming and Early-Qing Collected Works

Quick Take

Key Points

Article Excerpt

Want this in your inbox every morning?

More from arXiv cs.CL

Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?

Diagnosing Multi-step Reasoning Failures in Black-box LLMs via Stepwise Confidence Attribution

How Far Will They Go? Red-Teaming Online Influence with Large Language Models