LabGuard: Grounding Natural-Language Laboratory Rules into Runtime Guards for Embodied Laboratory Agents
Quick Answer
LabGuard introduces a safety suite that translates natural-language laboratory rules into executable specifications, reducing unsafe events from 39.5% to 23.8%.
Quick Take
LabGuard introduces a safety suite that translates natural-language laboratory rules into executable specifications, reducing unsafe events from 39.5% to 23.8%. With a task-scope F1 score of 79.4, it effectively integrates runtime monitors in dynamic lab environments, maintaining intervention rates below 0.5%.
Key Points
- LabGuard consists of LabGuard-IR, LabGuard-Bench, and LabGuard-Grounder.
- The system uses 812 supervised annotations derived from 203 seed laboratory rules.
- LabGuard generalizes to unseen laboratory-rule sources, enhancing safety.
- Runtime monitors are deployed at the controller boundary to ensure compliance.
- Experiments show significant reduction in unsafe events post-monitor compilation.
Paper Resources
Article Content
From source RSS / original summaryarXiv:2606. 31045v1 Announce Type: new Abstract: Scientific embodied agents are increasingly capable of carrying out laboratory procedures, but executing these procedures safely in dynamic laboratory environments remains challenging. Current safety approaches often overlook the intermediate step of transforming laboratory natural language, including safety rules, manuals, protocols, and standard operating procedures, into machine-checkable runtime constraints.
We introduce LabGuard (Laboratory Guard), a language-to-execution safety suite that grounds natural-language laboratory rules into executable specifications and deploys them as runtime guards. LabGuard includes three core components: LabGuard-IR, which defines a typed executable representation; LabGuard-Bench, which provides 812 supervised annotations expanded from 203 seed laboratory rules; and LabGuard-Grounder, which maps natural-language laboratory rules into LabGuard-IR.
The resulting IR instances are handled by the LabGuard Pipeline, which compiles them into runtime monitors and applies them at the controller boundary. Experiments show that LabGuard generalizes to unseen laboratory-rule sources, achieves 79. 4 task-scope F1, and reduces unsafe events from 39. 5% to 23. 8% after monitor compilation. In LabUtopia, its runtime monitors integrate with ACT, keeping interventions below 0. 5% while preserving task success.
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.AI
See more →The Verification Horizon: No Silver Bullet for Coding Agent Rewards
As coding agents evolve, verifying solutions becomes more challenging than generating them, necessitating a focus on scalable, faithful, and robust verification methods. The study reveals that no fixed reward function can sustain effectiveness as model capabilities advance, emphasizing the need for verification to evolve alongside solution generation.