BioELX: Cross-lingual Biomedical Entity Linking via Alias-based Retrieval and LLM Ranking

arXiv cs.CL·Yi Wang, Corina Dima, Liangyu Zhong, Steffen Staab

4h ago

·~1 min·5/28/2026·en·0

Quick Take

BioELX is a novel cross-lingual biomedical entity linking framework that enhances retrieval and disambiguation without annotated data.

Key Points

Utilizes multilingual aliases for improved candidate retrieval.
Employs LLM ranking for context-aware disambiguation.
Achieves state-of-the-art performance on multiple benchmarks.

Article Content

From source RSS / original summary

arXiv:2605. 27380v1 Announce Type: new Abstract: Cross-lingual biomedical entity linking (BEL) maps mentions in any language to unique identifiers in a biomedical knowledge base (KB), supporting clinical and biomedical NLP applications. However, expert-annotated training data for BEL are costly, especially for low-resource languages.

Moreover, many cross-lingual BEL systems rely on SapBERT-based retrievers trained on predominantly English aliases in the KB, leading to poor generalization to unseen non-English mentions and limited context-aware disambiguation. We propose BioELX, a two-stage cross-lingual BEL framework that requires no task-specific annotated training corpora. In Stage~1, we enrich SapBERT training with Wikidata-derived multilingual aliases and use the resulting retriever to improve cross-lingual candidate retrieval.

In Stage~2, we perform context-aware disambiguation with a pre-trained LLM ranker that jointly considers the mention context and candidate, eliminating the need for supervised training. Experiments on five benchmarks (XL-BEL, EMEA, Patent, WikiMed-DE, and MedMentions) show that BioELX achieves new state-of-the-art performance. It improves average Recall@1 on XL-BEL by +19. 2, with especially large gains for low-resource languages, e. g. , +21. 6 on Turkish, +22. 1 on Korean, +30.

8 on Thai, and delivers consistent improvements on EMEA (+6. 2), Patent (+5. 4), and WikiMed-DE (+12. 8). Code and resources will be released upon publication.

Reader Mode unavailable (could not extract clean content).

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

BioELX: Cross-lingual Biomedical Entity Linking via Alias-based Retrieval and LLM Ranking

Quick Take

Key Points

Article Content

Want this in your inbox every morning?

More from arXiv cs.CL

Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?

In-Context Optimization for Retrieval-Augmented Generation: A Gradient-Descent Perspective

Extracting Training Data from Diffusion Language Models via Infilling