BioELX: Cross-lingual Biomedical Entity Linking via Alias-based Retrieval and LLM Ranking
Quick Take
BioELX is a novel cross-lingual biomedical entity linking framework that enhances retrieval and disambiguation without annotated data.
Key Points
- Utilizes multilingual aliases for improved candidate retrieval.
- Employs LLM ranking for context-aware disambiguation.
- Achieves state-of-the-art performance on multiple benchmarks.
Article Content
From source RSS / original summaryarXiv:2605. 27380v1 Announce Type: new Abstract: Cross-lingual biomedical entity linking (BEL) maps mentions in any language to unique identifiers in a biomedical knowledge base (KB), supporting clinical and biomedical NLP applications. However, expert-annotated training data for BEL are costly, especially for low-resource languages.
Moreover, many cross-lingual BEL systems rely on SapBERT-based retrievers trained on predominantly English aliases in the KB, leading to poor generalization to unseen non-English mentions and limited context-aware disambiguation. We propose BioELX, a two-stage cross-lingual BEL framework that requires no task-specific annotated training corpora. In Stage~1, we enrich SapBERT training with Wikidata-derived multilingual aliases and use the resulting retriever to improve cross-lingual candidate retrieval.
In Stage~2, we perform context-aware disambiguation with a pre-trained LLM ranker that jointly considers the mention context and candidate, eliminating the need for supervised training. Experiments on five benchmarks (XL-BEL, EMEA, Patent, WikiMed-DE, and MedMentions) show that BioELX achieves new state-of-the-art performance. It improves average Recall@1 on XL-BEL by +19. 2, with especially large gains for low-resource languages, e. g. , +21. 6 on Turkish, +22. 1 on Korean, +30.
8 on Thai, and delivers consistent improvements on EMEA (+6. 2), Patent (+5. 4), and WikiMed-DE (+12. 8). Code and resources will be released upon publication.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CL
See more →Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?
The reliability of LLM judges for evaluating deep research agents is critically assessed using the REFLECT benchmark.