StainFlow: Entity-Stain Tracking and Evidence Linking for Process Rewards in GUI Agents
Quick Answer
StainFlow introduces a novel entity-stain tracking model for GUI agents, improving online RL success by 3.2% and trajectory completion judgment accuracy by 1.8% on benchmarks like AndroidWorld and OGRBench.
Quick Take
StainFlow introduces a novel entity-stain tracking model for GUI agents, improving online RL success by 3.2% and trajectory completion judgment accuracy by 1.8% on benchmarks like AndroidWorld and OGRBench. It addresses limitations in existing Process Reward Models by providing objective task phase separation and dynamic evidence linking.
Key Points
- StainFlow enhances GUI agent training with objective entity tracking.
- Global Entity Stain Tracking reduces subjectivity in task phase separation.
- Local Stain Evidence Linking dynamically constructs evidence windows.
- Achieved 3.2% improvement in online RL success on AndroidWorld.
- Trajectory completion judgment accuracy increased by 1.8% on OGRBench.
Article Content
From source RSS / original summaryarXiv:2606. 07027v1 Announce Type: new Abstract: Reinforcement Learning (RL) has become a promising approach for improving GUI Agents in long-horizon, stochastic digital environments, but trajectory-level success feedback is too sparse to provide reliable credit assignment for intermediate exploration steps. To mitigate this issue, recent studies introduce Process Reward Models (PRMs), which provide finer-grained training feedback through global milestone verification or local step-level evaluation.
However, these methods still suffer from two level-specific limitations: global milestone decomposition is subjective and singular, making it difficult to accommodate the multiple valid execution paths in real GUI tasks, while fixed local judging windows may miss long-range key evidence or dilute the decision signal with irrelevant frames. Inspired by stain-tracing mechanisms in network flow analysis, we propose StainFlow, an entity-stain-flow process reward model for GUI Agents.
To reduce the subjectivity of global partitioning, we introduce the Global Entity Stain Tracking module, which extracts visually verifiable task entities and tracks how their stain concentrations and states evolve along the trajectory, allowing task phases to be objectively separated by changes in the entity evidence flow. To improve the accuracy of local verification, we introduce the Local Stain Evidence Linking module.
Centered on the triggering entities of each candidate key node, it retrieves relevant steps based on their stain concentrations and state changes, and dynamically constructs high-density evidence windows for verifying true key nodes. Extensive experiments on AndroidWorld and OGRBench show that StainFlow relatively improves online RL success by 3. 2% and trajectory completion judgment accuracy by 1. 8%.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.AI
See more →The Sim-to-Real Gap of Foundation Model Agents: A Unified MDP Perspective
This paper addresses the sim-to-real gap for foundation model agents by framing it within a Markov Decision Process (MDP) structure. It advocates for established solutions like domain randomization to enhance agent robustness, aiming to create standardized benchmarks for reliable real-world applications.