Multilingual Coreference Resolution via Cycle-Consistent Machine Translation
Quick Answer
This study introduces a novel coreference resolution pipeline utilizing machine translation to enhance training data for low-resource languages.
Quick Take
This study introduces a novel coreference resolution pipeline utilizing machine translation to enhance training data for low-resource languages. By employing back-translation and cosine similarity with BERT, the method significantly improves coreference resolution performance, demonstrating effectiveness in languages lacking prior corpora.
Key Points
- Proposed pipeline uses machine translation to generate training data for low-resource languages.
- Back-translation assesses quality via cosine similarity in BERT's latent space.
- Significant performance gains observed in coreference resolution across four low-resource languages.
- Pipeline enables accurate resolution in languages without existing corpora.
Article Excerpt
From source RSS / original summaryarXiv:2606. 05444v1 Announce Type: new Abstract: Coreference resolution is a core NLP task, having a broad range of downstream applications, e. g. ~machine translation, question answering, document summarization, etc. While the task is well-studied in English, comparatively less attention is dedicated to coreference resolution in other languages, especially low-resource ones.
To mitigate this gap, we propose a novel coreference resolution pipeline that harnesses machine translation (MT) from English to a target low-resource language, to generate or expand training data. To automatically validate the quality of the translated samples, we back-translate the samples and assess the similarity with the original English samples via cosine similarity in the latent space of a BERT model.
The resulting similarity scores are integrated into the loss function to weight training samples according to their MT cycle consistency. Extensive experiments on four low-resource languages show that our pipeline brings significant performance gains in coreference resolution. Moreover, our pipeline enables accurate coreference resolution in languages where no previous corpora were available.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CL
See more →Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?
The REFLECT benchmark reveals that current LLM judges are unreliable, achieving below 55% accuracy in evaluating reasoning and evidence use, highlighting the need for improved evaluation methods for deep research agents.