NVIDIA Releases Polar, a Token-Faithful Rollout Framework for GRPO Training Across Codex, Claude Code, and Qwen Code
Quick Take
NVIDIA has launched Polar, a rollout framework that enhances GRPO training for language agents without altering their harnesses. By utilizing a model API proxy, Polar significantly boosts performance on SWE-Bench, achieving a 22.6-point increase under Codex, 4.8 points under Claude Code, and 6.2 points under Pi, and is available as a NeMo Gym environment.
Key Points
- Polar captures token-level interactions for training language agents.
- Improves SWE-Bench Verified pass@1 by 22.6 points under Codex harness.
- Achieves 4.8-point increase under Claude Code and 6.2 points under Pi.
- Framework registered as a NeMo Gym environment.
- Released under the ProRL Agent Server repository.
Article Excerpt
From source RSS / original summaryNVIDIA researchers have introduced Polar, a rollout framework that trains language agents using reinforcement learning without modifying their agent harnesses. Polar places a model API proxy between the harness and the inference server, capturing token-level interactions and reconstructing trainer-ready trajectories. Using GRPO on a Qwen3. 5-4B base model, Polar improves SWE-Bench Verified pass@1 by 22. 6 points under the Codex harness, 4. 8 points under Claude Code, and 6. 2 points under Pi.
The framework is registered as a NeMo Gym environment and released under the ProRL Agent Server repository. The post NVIDIA Releases Polar, a Token-Faithful Rollout Framework for GRPO Training Across Codex, Claude Code, and Qwen Code appeared first on MarkTechPost.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from MarkTechPost
See more →
Perplexity AI Open-Sources Unigram Tokenizer That Achieves 5x Lower p50 Latency Than Hugging Face tokenizers Crate
Perplexity AI has released a rewritten Unigram tokenizer that significantly reduces reranker latency by achieving 5-6x lower p50 latency compared to Hugging Face's tokenizers. This advancement also leads to a substantial decrease in production CPU utilization, benefiting developers and companies relying on efficient tokenization in their AI applications.

