Bridging Scientific Heritage: An Arabic--Russian Parallel Corpus and LLM Benchmark for Sustainable Knowledge Transfer
Quick Answer
This paper shows that A new Arabic-Russian benchmark for scientific translation includes a hybrid corpus of 27,000 sentence pairs and fine-tunes models like Qwen2.5-7B, achieving BLEU 23.15.
Quick Take
A new Arabic-Russian benchmark for scientific translation includes a hybrid corpus of 27,000 sentence pairs and fine-tunes models like Qwen2.5-7B, achieving BLEU 23.15. This work facilitates knowledge exchange between Arabic and Russian researchers, supporting sustainable partnerships and innovation.
Key Points
- Hybrid corpus of 27,000 sentence pairs from scientific abstracts and general texts.
- Fine-tuned models include mT5-base, NLLB-200-distilled-1.3B, and Qwen2.5-7B.
- Qwen2.5-7B with QLoRA achieved BLEU 23.15, outperforming the zero-shot baseline.
- Few-shot prompting did not enhance performance, indicating need for domain-specific tuning.
- Models, corpus, and evaluation code are publicly released to aid research collaboration.
Paper Resources
Article Content
From source RSS / original summaryarXiv:2606. 30943v1 Announce Type: new Abstract: Russian and Arabic are among the major languages of scientific communication. Language barriers impede the exchange of research results between these communities, which affects international collaboration and the progress of sustainability-related research. We present a benchmark for Arabic--Russian scientific translation.
The benchmark includes a hybrid parallel corpus of about 27,000 sentence pairs, compiled from scientific abstracts and general-domain texts (religion, news, conversations). We fine-tune three multilingual language models -- mT5-base (580M parameters), NLLB-200-distilled-1. 3B (1. 3B), and Qwen2. 5-7B-Instruct (7B) -- using LoRA with ranks 8, 16, 32, and 64. The Qwen2. 5-7B model with QLoRA (rank 8) yields BLEU 23. 15, chrF 43. 89, BERTScore 0. 906, and COMET 0. 758. These are +4. 36 BLEU and +0.
051 COMET above the zero-shot baseline. Few-shot prompting with three examples does not improve performance, indicating that domain-specific fine-tuning is required. We release the models, the corpus, and the evaluation code. By lowering the language barrier for scientific texts, the work enables knowledge exchange between Arabic-speaking and Russian-speaking researchers.
It contributes to sustainable partnerships (UN SDG 17) and innovation infrastructure (SDG 9), aligning with the conference's focus on technology-driven sustainable development.
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CL
See more →Quantifying Prior Dominance in Systems
The study introduces the Normalized Context Utilization (NCU) metric to evaluate Retrieval-Augmented Generation (RAG) systems, revealing that Small Language Models (SLMs) outperform larger models in factual extraction. The findings indicate that traditional scaling laws yield diminishing returns, with a commercial API frequently failing against adversarial evidence due to systemic confidence collapse.