Bridging Scientific Heritage: An Arabic--Russian Parallel Corpus and LLM Benchmark for Sustainable Knowledge Transfer

12h ago

·~1 min·7/1/2026·en·1

Quick Answer

This paper shows that A new Arabic-Russian benchmark for scientific translation includes a hybrid corpus of 27,000 sentence pairs and fine-tunes models like Qwen2.5-7B, achieving BLEU 23.15.

Quick Take

A new Arabic-Russian benchmark for scientific translation includes a hybrid corpus of 27,000 sentence pairs and fine-tunes models like Qwen2.5-7B, achieving BLEU 23.15. This work facilitates knowledge exchange between Arabic and Russian researchers, supporting sustainable partnerships and innovation.

Key Points

Hybrid corpus of 27,000 sentence pairs from scientific abstracts and general texts.
Fine-tuned models include mT5-base, NLLB-200-distilled-1.3B, and Qwen2.5-7B.
Qwen2.5-7B with QLoRA achieved BLEU 23.15, outperforming the zero-shot baseline.
Few-shot prompting did not enhance performance, indicating need for domain-specific tuning.
Models, corpus, and evaluation code are publicly released to aid research collaboration.

Paper Resources

Read Paperarxiv.org View PDFarxiv.org

Article Content

From source RSS / original summary

arXiv:2606. 30943v1 Announce Type: new Abstract: Russian and Arabic are among the major languages of scientific communication. Language barriers impede the exchange of research results between these communities, which affects international collaboration and the progress of sustainability-related research. We present a benchmark for Arabic--Russian scientific translation.

The benchmark includes a hybrid parallel corpus of about 27,000 sentence pairs, compiled from scientific abstracts and general-domain texts (religion, news, conversations). We fine-tune three multilingual language models -- mT5-base (580M parameters), NLLB-200-distilled-1. 3B (1. 3B), and Qwen2. 5-7B-Instruct (7B) -- using LoRA with ranks 8, 16, 32, and 64. The Qwen2. 5-7B model with QLoRA (rank 8) yields BLEU 23. 15, chrF 43. 89, BERTScore 0. 906, and COMET 0. 758. These are +4. 36 BLEU and +0.

051 COMET above the zero-shot baseline. Few-shot prompting with three examples does not improve performance, indicating that domain-specific fine-tuning is required. We release the models, the corpus, and the evaluation code. By lowering the language barrier for scientific texts, the work enables knowledge exchange between Arabic-speaking and Russian-speaking researchers.

It contributes to sustainable partnerships (UN SDG 17) and innovation infrastructure (SDG 9), aligning with the conference's focus on technology-driven sustainable development.

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.CL

See more →

arXiv cs.CL·Barak Or

1w ago

FeaturedOriginal

Quantifying Prior Dominance in Systems

AI Summary

The study introduces the Normalized Context Utilization (NCU) metric to evaluate Retrieval-Augmented Generation (RAG) systems, revealing that Small Language Models (SLMs) outperform larger models in factual extraction. The findings indicate that traditional scaling laws yield diminishing returns, with a commercial API frequently failing against adversarial evidence due to systemic confidence collapse.

#LLM #AI Coding #Inference #AI Startup

Bridging Scientific Heritage: An Arabic--Russian Parallel Corpus and LLM Benchmark for Sustainable Knowledge Transfer

Quick Answer

Quick Take

Key Points

Paper Resources

Article Content

Want this in your inbox every morning?

More from arXiv cs.CL

Quantifying Prior Dominance in Systems

Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?

When Plausible Is Not Realistic: Evaluating Human Mobility in LLM-Based Urban Simulation

Quick Answer

Quick Take

Key Points

Paper Resources

Article Content

Want this in your inbox every morning?

More from arXiv cs.CL

Quantifying Prior Dominance in RAG Systems

Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?

When Plausible Is Not Realistic: Evaluating Human Mobility in LLM-Based Urban Simulation

Quantifying Prior Dominance in Systems