
Meet Harness-1: A 20B Retrieval Subagent Trained With Reinforcement Learning Inside a Stateful Search Harness on gpt-oss-20b
Quick Answer
This paper shows that Harness-1, a 20B retrieval subagent from UIUC and Chroma, utilizes reinforcement learning in a stateful search harness, achieving a 0.730 average curated recall across eight benchmarks, outperforming the next open subagent by 11.4 points.
Quick Take
Harness-1, a 20B retrieval subagent from UIUC and Chroma, utilizes reinforcement learning in a stateful search harness, achieving a 0.730 average curated recall across eight benchmarks, outperforming the next open subagent by 11.4 points. The model's weights and harness code are publicly available.
Key Points
- Harness-1 is trained using reinforcement learning within a stateful search framework.
- It maintains a candidate pool, curated set, evidence graph, and verification records.
- Achieved 0.730 average curated recall, trailing only Opus-4.6.
- Publicly available weights and harness code enhance accessibility for researchers.
- Outperformed the next open subagent by 11.4 points across eight benchmarks.
Article Excerpt
From source RSS / original summaryUIUC and Chroma's Harness-1 is a 20B retrieval subagent trained with reinforcement learning inside a stateful search harness. The harness maintains the bookkeeping — candidate pool, importance-tagged curated set, evidence graph, verification records — while the policy decides what to search, curate, verify, and when to stop. It reaches 0. 730 average curated recall across eight benchmarks, beating the next open subagent by 11. 4 points and trailing only Opus-4. 6. Weights and harness code are public.
The post Meet Harness-1: A 20B Retrieval Subagent Trained With Reinforcement Learning Inside a Stateful Search Harness on gpt-oss-20b appeared first on MarkTechPost.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from MarkTechPost
See more →NVIDIA garak Tutorial: Build a Complete Defensive LLM Red-Teaming Workflow with Custom Probes and Detectors
The NVIDIA garak tutorial provides a comprehensive framework for defensive LLM red-teaming, detailing setup, plugin discovery, and evaluations using Hugging Face models. It emphasizes analyzing safety scores, attack success rates, and extending functionality with custom probes, concluding with exporting results in AVID format for vulnerability assessment.