Fluency and Faithfulness in Human and Machine Literary Translation

arXiv cs.CL·Sarah Griebel, Ted Underwood

4d ago

·~2 min·5/18/2026·en·1

Quick Take

Study reveals a tradeoff between fluency and faithfulness in literary translation by LLMs.

Key Points

Analyzed 130,486 translated paragraphs from 106 novels.
Fluency measured with translationese classifier; faithfulness with COMET-KIWI.
Negative correlation found between fluency and faithfulness.

📖 Reader Mode

~2 min read

[Submitted on 14 May 2026]

View PDF HTML (experimental)

Abstract:Literary translation requires balancing target-language fluency with faithfulness to the source. Recent large language models (LLMs) often produce fluent translations, but it remains unclear whether fluency corresponds to semantic preservation in literary text. We examine this relationship using 130,486 translated paragraphs from 106 novels in 16 source languages, including human, Google Translate, and TranslateGemma translations. Fluency is measured as original-likeness with a translationese classifier trained on paragraph part-of-speech n-grams, and faithfulness with the automatic translation evaluation metric COMET-KIWI. We control for paragraph length and find a consistent negative correlation between fluency and faithfulness. The pattern appears for both human and Google Translate, but is weaker and often non-significant for TranslateGemma. These results show that segment length matters for automatic evaluation and suggest a tradeoff between fluency and faithfulness in literary translation.

Comments:	Accepted NLP4DH 2026
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2605.15282 [cs.CL]
	(or arXiv:2605.15282v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2605.15282 arXiv-issued DOI via DataCite

Submission history

From: Sarah Griebel [view email]
[v1] Thu, 14 May 2026 18:00:34 UTC (1,971 KB)

— Originally published at arxiv.org

Continue reading on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

Fluency and Faithfulness in Human and Machine Literary Translation

Quick Take

Key Points

📖 Reader Mode

Submission history

Want this in your inbox every morning?

More from arXiv cs.CL

Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?

Diagnosing Multi-step Reasoning Failures in Black-box LLMs via Stepwise Confidence Attribution

Comparing LLM and Fine-Tuned Model Performance on NVDRS Circumstance Extraction with Varying Prompt Complexity

Related in this space

From Prompts to Protocols: An AI Agent for Laboratory Automation

Agentic Trading: When LLM Agents Meet Financial Markets