LabGuard: Grounding Natural-Language Laboratory Rules into Runtime Guards for Embodied Laboratory Agents

arXiv cs.AI·Jingpu Yang, Fengxian Ji, Zhengzhao Lai, Zhexuan Cui, Guangxian Ouyang, Qian Jiang, Fan Zhang, Min Peng, Qianqian Xie, Preslav Nakov, Zhuohan Xie

4h ago

·~1 min·7/1/2026·en·0

Quick Answer

LabGuard introduces a safety suite that translates natural-language laboratory rules into executable specifications, reducing unsafe events from 39.5% to 23.8%.

Quick Take

LabGuard introduces a safety suite that translates natural-language laboratory rules into executable specifications, reducing unsafe events from 39.5% to 23.8%. With a task-scope F1 score of 79.4, it effectively integrates runtime monitors in dynamic lab environments, maintaining intervention rates below 0.5%.

Key Points

LabGuard consists of LabGuard-IR, LabGuard-Bench, and LabGuard-Grounder.
The system uses 812 supervised annotations derived from 203 seed laboratory rules.
LabGuard generalizes to unseen laboratory-rule sources, enhancing safety.
Runtime monitors are deployed at the controller boundary to ensure compliance.
Experiments show significant reduction in unsafe events post-monitor compilation.

Paper Resources

Read Paperarxiv.org View PDFarxiv.org

Article Content

From source RSS / original summary

arXiv:2606. 31045v1 Announce Type: new Abstract: Scientific embodied agents are increasingly capable of carrying out laboratory procedures, but executing these procedures safely in dynamic laboratory environments remains challenging. Current safety approaches often overlook the intermediate step of transforming laboratory natural language, including safety rules, manuals, protocols, and standard operating procedures, into machine-checkable runtime constraints.

We introduce LabGuard (Laboratory Guard), a language-to-execution safety suite that grounds natural-language laboratory rules into executable specifications and deploys them as runtime guards. LabGuard includes three core components: LabGuard-IR, which defines a typed executable representation; LabGuard-Bench, which provides 812 supervised annotations expanded from 203 seed laboratory rules; and LabGuard-Grounder, which maps natural-language laboratory rules into LabGuard-IR.

The resulting IR instances are handled by the LabGuard Pipeline, which compiles them into runtime monitors and applies them at the controller boundary. Experiments show that LabGuard generalizes to unseen laboratory-rule sources, achieves 79. 4 task-scope F1, and reduces unsafe events from 39. 5% to 23. 8% after monitor compilation. In LabUtopia, its runtime monitors integrate with ACT, keeping interventions below 0. 5% while preserving task success.

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.AI

See more →

arXiv cs.AI·Binghai Wang, Chenlong Zhang, Dayiheng Liu, Jiajun Zhang, Jiawei Chen, Mouxiang Chen, Rongyao Fang, Siyuan Zhang, Xuwu Wang, Yuheng Jing, Zeyao Ma, Zeyu Cui

5d ago

FeaturedOriginal

The Verification Horizon: No Silver Bullet for Coding Agent Rewards

AI Summary

As coding agents evolve, verifying solutions becomes more challenging than generating them, necessitating a focus on scalable, faithful, and robust verification methods. The study reveals that no fixed reward function can sustain effectiveness as model capabilities advance, emphasizing the need for verification to evolve alongside solution generation.

#Agent #AI Coding #Inference #Policy