SCI-PRM: A Tool Aware Process Reward Model for Scientific Reasoning Verification

arXiv cs.AI·Xiangyu Zhao, Hengyuan Zhao, Yiheng Wang, Wanghan Xu, Yuhao Zhou, Qinglong Cao, Zhiwang Zhou, Lei Bai, Wenlong Zhang, Xiao-Ming Wu

6/4/2026

·~1 min·6/4/2026·en·28

Quick Answer

The Sci-PRM model enhances scientific reasoning in complex domains by utilizing a new dataset, SCIPRM70K, which integrates tool usage with reasoning.

Quick Take

It improves foundation models through effective and provides dense reward signals in Reinforcement Learning, addressing hallucination issues and performance limitations.

Key Points

Introduces SCIPRM70K, a dataset with Chain-of-Tool trajectories for scientific reasoning.
Sci-PRM model improves tool selection and execution accuracy during inference.
Enables Best-of-N selection for effective test-time scaling in foundation models.
Provides dense reward signals in Reinforcement Learning to mitigate advantage disappearance.
Significantly enhances performance ceilings in scientific reasoning tasks.

Paper Resources

Read Paperarxiv.org View PDFarxiv.org

Source Excerpt

From the original publisher, up to about 700 characters

arXiv:2606. 04579v1 Announce Type: new Abstract: While Process Reward Models (PRMs) have achieved remarkable success in mathematical reasoning, their application in complex scientific domains-such as biology, chemistry, and physics remains largely unexplored. Scientific problems demand not only logical rigor but also factual consistency and the precise usage of domain-specific tools, areas where current models often suffer from hallucinations and lack of verification.

In this paper, we first construct SCIPRM70K, a large-scale dataset featuring Chain-of-Tool trajectories that explicitly interleave reasoning with the execution of scientific tools. …

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.AI

See more →

arXiv cs.AI·David Krongauz, Arad Zulti, Eran Segal, Teddy Lazebnik

3d ago

FeaturedOriginal

Automatic Ordinary Differential Equations Discovery For Biological Systems Using Powered Agentic System

AI Summary

The MEDA system utilizes large language models and symbolic regression to autonomously discover ordinary differential equations for biological systems, achieving strong structural recovery and biologically plausible models. It outperforms existing methods by integrating domain knowledge and mechanistic constraints, demonstrating effective retrieval and extrapolation capabilities.

#LLM #Agent #Inference #AI Startup

SCI-PRM: A Tool Aware Process Reward Model for Scientific Reasoning Verification

Quick Answer

Quick Take

Key Points

Paper Resources

Source Excerpt

Want this in your inbox every morning?

More from arXiv cs.AI

Automatic Ordinary Differential Equations Discovery For Biological Systems Using Powered Agentic System

The Emerging Paradigm of Geospatial Foundation Models: From Pre-Training to Agentic Reasoning

Adversarial Social Epistemology for Assemblies of Humans and

Quick Answer

Quick Take

Key Points

Paper Resources

Source Excerpt

Want this in your inbox every morning?

More from arXiv cs.AI

Automatic Ordinary Differential Equations Discovery For Biological Systems Using Large Language Model Powered Agentic System

The Emerging Paradigm of Geospatial Foundation Models: From Pre-Training to Agentic Reasoning

Adversarial Social Epistemology for Assemblies of Humans and Large Language Models

Automatic Ordinary Differential Equations Discovery For Biological Systems Using Powered Agentic System

Adversarial Social Epistemology for Assemblies of Humans and