RightNow-Arabic-0.5B-Turbo: An Open Sub-1B Arabic Language Model via Vocabulary Injection and Edge-First Deployment

arXiv cs.CL·Jaber Jaber, Osama Jaber

1d ago

·~2 min·5/29/2026·en·1

Quick Take

RightNow-Arabic-0.5B-Turbo is a 518M-parameter Arabic-specialized LLM that outperforms existing sub-1B models, achieving 35.9% mean accuracy on Arabic benchmarks while being significantly smaller. It utilizes advanced techniques like vocabulary injection and quantization, delivering 635 tokens/s on a single H100 GPU. All resources are publicly available on Hugging Face.

Key Points

Achieves 35.9% mean accuracy on Arabic benchmarks, outperforming existing sub-1B models.
Utilizes 27,032 Arabic tokens and advanced training techniques for better performance.
Quantized model size is 398 MB, delivering 635 tokens/s on a single H100 GPU.
All code, weights, and benchmark scripts are available on Hugging Face.
Merges weights across three checkpoints for improved accuracy and efficiency.

Article Content

From source RSS / original summary

arXiv:2605. 28827v1 Announce Type: new Abstract: Open Arabic large language models split into two classes: sub-1B multilingual models that treat Arabic as an afterthought (Qwen2. 5-0. 5B, Falcon-H1-0. 5B), and 7B-70B Arabic-specialized models that require a server to run (Jais, AceGPT, ALLaM, SILMA). The one published attempt at a sub-2B Arabic-specialized model, Kuwain-1. 5B, never released its weights. We present RightNow-Arabic-0. 5B-Turbo, a 518M-parameter Arabic-specialized decoder LLM built on Qwen2. 5-0. 5B.

The pipeline adds 27,032 Arabic tokens via mean-subtoken initialization, continues pretraining on 504M Arabic tokens on 8xH100 with FSDP, FlashAttention varlen packing, and Liger fused kernels, then applies supervised fine-tuning on 129,116 Arabic instruction pairs with response-only loss masking, direct preference optimization on 6,750 Arabic preference pairs, and weight soup merging across three checkpoints.

On three lm-evaluation-harness Arabic benchmarks (COPA-ar, Arabic HellaSwag, ArabicMMLU) the merged model reaches 35. 9% mean accuracy, beats every same-class open model, ties Falcon-H1-1. 5B on COPA-ar (58. 4%) at one-third the size, and recovers 67% of SILMA-9B's mean at 1/18 the parameters. The edge build quantizes to 398 MB (q4_k_m) and delivers 635 tokens/s at batch size 1 on a single H100 via llama. cpp.

All code (5,555 lines across 25 scripts), weights (bf16, int8, and four GGUF quantizations), and benchmark scripts are released at https://huggingface. co/RightNowAI/RightNow-Arabic-0. 5B-Turbo.

Reader Mode unavailable (could not extract clean content).

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

RightNow-Arabic-0.5B-Turbo: An Open Sub-1B Arabic Language Model via Vocabulary Injection and Edge-First Deployment

Quick Take

Key Points

Article Content

Want this in your inbox every morning?

More from arXiv cs.CL

Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?

What are They Thinking? Delineation, Probing and Tracking of Concepts in LLMs

In-Context Optimization for Retrieval-Augmented Generation: A Gradient-Descent Perspective

Related in this space

After Nvidia’s $20B not-acqui-hire, AI chip startup Groq reportedly raising $650M

TorqueAGI Announces Collaborations with NVIDIA, John Deere, and Dexterity to Advance Physical AI for Enterprise-Grade Robots

NVIDIA Dynamo Snapshot: Fast Startup for Inference Workloads on Kubernetes