An Interactive Paradigm for Deep Research
Quick Take
The SteER framework enhances deep research systems by allowing mid-process user control, outperforming existing models by up to 22.80% in alignment metrics. It integrates diverse planning and utility signals, achieving superior quality in breadth and balance, with over 85% preference from human readers in alignment judgments.
Key Points
- SteER introduces interpretable control in long-horizon research workflows.
- Utilizes cost-benefit analysis to decide between user input and autonomous progression.
- Achieves 22.80% improvement in alignment over state-of-the-art models.
- Maintains an evolving persona model throughout the research session.
- First framework to combine interactive control with deep research methodologies.
Article Content
From source RSS / original summaryarXiv:2605. 24266v1 Announce Type: new Abstract: Recent advances in large language models (LLMs) have enabled deep research systems that synthesize comprehensive, report-style answers to open-ended queries by combining retrieval, reasoning, and generation. Yet most frameworks rely on rigid workflows with one-shot scoping and long autonomous runs, offering little room for course correction if user intent shifts mid-process.
We present SteER, a framework for Steerable deEp Research that introduces interpretable, mid-process control into long-horizon research workflows. At each decision point, SteER uses a cost-benefit formulation to determine whether to pause for user input or to proceed autonomously. It combines diversity-aware planning with utility signals that reward alignment, novelty, and coverage, and maintains a live persona model that evolves throughout the session.
SteER outperforms state-of-the-art open-source and proprietary baselines by up to 22. 80\% on alignment, leads on quality metrics such as breadth and balance, and is preferred by human readers in 85\%+ of pairwise alignment judgments. We also introduce a persona-query benchmark and data-generation pipeline. To our knowledge, this is the first work to advance deep research with an interactive, interpretable control paradigm, paving the way for controllable, user-aligned agents in long-form tasks.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CL
See more →Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?
The reliability of LLM judges for evaluating deep research agents is critically assessed using the REFLECT benchmark.