MobileExplorer: Accelerating On-Device Inference for Mobile GUI Agents via Online Exploration
Quick Take
MobileExplorer introduces a framework that enhances on-device inference for mobile GUI agents, reducing reasoning steps and latency by 23% while improving task success rates by up to 5%. This approach leverages lightweight exploration of UI elements and a robust rollback mechanism, addressing privacy and latency issues associated with cloud-based models.
Key Points
- MobileExplorer accelerates on-device inference for vision-based mobile GUI agents.
- Reduces average reasoning steps and end-to-end latency by 23%.
- Improves task success rates by up to 5% on complex tasks.
- Utilizes a two-level rollback mechanism for reliable execution.
- Evaluated on multiple devices using the AndroidWorld benchmark.
Article Content
From source RSS / original summaryarXiv:2605. 26546v1 Announce Type: new Abstract: Mobile graphical user interface (GUI) agents enable AI models to autonomously operate smartphones on behalf of users. However, most existing systems focus primarily on optimizing task accuracy and rely on cloud-hosted models for inference, which introduces privacy concerns and network-dependent latency. As a result, fully on-device deployment of mobile GUI agents remains underexplored.
We propose MobileExplorer, a new framework that accelerates on-device inference for vision-based mobile GUI agents via online exploration. The key idea is to exploit the long per-step reasoning time of vision-language models (VLMs) by performing lightweight, parallel exploration of UI elements. During model inference, the agent proactively probes semantically relevant UI elements and records these exploration traces as structured memory.
To ensure reliable execution in live mobile environments, we design a two-level rollback mechanism that robustly restores the initial UI state when a fast but naive backtracking strategy fails. The collected exploration traces are then summarized into concise contextual hints and injected into the prompt to enhance the subsequent reasoning step.
We evaluate MobileExplorer on multiple off-the-shelf devices using the AndroidWorld benchmark, as well as newly designed, more complex tasks and dynamic on-device environments. MobileExplorer reduces the average number of reasoning steps and end-to-end latency by 23\%, while maintaining or improving task success rates by up to 5\%. A video demonstration of MobileExplorer performance in the real world is available at https://youtu. be/thK7MJmdlvM .
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.AI
See more →The Importance of Out-of-Band Metadata for Safe Autonomous Agents: The Redpanda Agentic Data Plane
The Redpanda Agentic Data Plane (ADP) introduces out-of-band metadata channels to enhance the safety of autonomous AI agents, ensuring secure data access and tamper-proof audit trails. This architecture mitigates risks associated with unpredictable AI behavior by enforcing governance throughout the agent lifecycle, demonstrated in a multi-agent trading system with strict data scoping and approval thresholds.