RightNow-Arabic-0.5B-Turbo: An Open Sub-1B Arabic Language Model via Vocabulary Injection and Edge-First Deployment
Quick Take
RightNow-Arabic-0.5B-Turbo is a 518M-parameter Arabic-specialized LLM that outperforms existing sub-1B models, achieving 35.9% mean accuracy on Arabic benchmarks while being significantly smaller. It utilizes advanced techniques like vocabulary injection and quantization, delivering 635 tokens/s on a single H100 GPU. All resources are publicly available on Hugging Face.
Key Points
- Achieves 35.9% mean accuracy on Arabic benchmarks, outperforming existing sub-1B models.
- Utilizes 27,032 Arabic tokens and advanced training techniques for better performance.
- Quantized model size is 398 MB, delivering 635 tokens/s on a single H100 GPU.
- All code, weights, and benchmark scripts are available on Hugging Face.
- Merges weights across three checkpoints for improved accuracy and efficiency.
Article Content
From source RSS / original summaryarXiv:2605. 28827v1 Announce Type: new Abstract: Open Arabic large language models split into two classes: sub-1B multilingual models that treat Arabic as an afterthought (Qwen2. 5-0. 5B, Falcon-H1-0. 5B), and 7B-70B Arabic-specialized models that require a server to run (Jais, AceGPT, ALLaM, SILMA). The one published attempt at a sub-2B Arabic-specialized model, Kuwain-1. 5B, never released its weights. We present RightNow-Arabic-0. 5B-Turbo, a 518M-parameter Arabic-specialized decoder LLM built on Qwen2. 5-0. 5B.
The pipeline adds 27,032 Arabic tokens via mean-subtoken initialization, continues pretraining on 504M Arabic tokens on 8xH100 with FSDP, FlashAttention varlen packing, and Liger fused kernels, then applies supervised fine-tuning on 129,116 Arabic instruction pairs with response-only loss masking, direct preference optimization on 6,750 Arabic preference pairs, and weight soup merging across three checkpoints.
On three lm-evaluation-harness Arabic benchmarks (COPA-ar, Arabic HellaSwag, ArabicMMLU) the merged model reaches 35. 9% mean accuracy, beats every same-class open model, ties Falcon-H1-1. 5B on COPA-ar (58. 4%) at one-third the size, and recovers 67% of SILMA-9B's mean at 1/18 the parameters. The edge build quantizes to 398 MB (q4_k_m) and delivers 635 tokens/s at batch size 1 on a single H100 via llama. cpp.
All code (5,555 lines across 25 scripts), weights (bf16, int8, and four GGUF quantizations), and benchmark scripts are released at https://huggingface. co/RightNowAI/RightNow-Arabic-0. 5B-Turbo.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CL
See more →Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?
The reliability of LLM judges for evaluating deep research agents is critically assessed using the REFLECT benchmark.


