WRIT: Write-Read Intensive Trajectory Synthesis for Multi-Turn User-Facing Agents

arXiv cs.CL·Hengrui Gu, Xiaotian Han, Kaixiong Zhou

6/3/2026

·~2 min·6/3/2026·en·3

Quick Answer

This paper shows that The WRIT pipeline synthesizes complex multi-turn training trajectories for user-facing agents, enabling robust decision-making under high information load.

Quick Take

The WRIT pipeline synthesizes complex multi-turn training trajectories for user-facing agents, enabling robust decision-making under high information load. A 4B model trained on 2K WRIT trajectories outperforms GPT-5.1 on the τ²-bench while reducing inference-time token usage, demonstrating efficient agent behavior.

Key Points

WRIT synthesizes write-intensive and read-heavy tasks for agent training.
It diversifies user behavior to reflect realistic conversational variations.
Training with WRIT improves decision-making under high information load.
A 4B model trained on WRIT outperforms GPT-5.1 on τ²-bench.
Only 2K synthesized trajectories significantly reduce inference-time token usage.

Paper Resources

Read Paperarxiv.org View PDFarxiv.org

Article Content

From source RSS / original summary

arXiv:2606. 02908v1 Announce Type: new Abstract: Multi-turn user-facing agents must infer user intent from incomplete requests, collect missing information through dialogue and tools, and execute valid actions. A training trajectory records this process as an interleaved sequence of user messages, agent responses, tool calls, etc.

Synthesizing sufficiently complex trajectory has become a central route to train agents: existing pipelines often increase difficulty by composing multiple user requests into longer tasks, producing write-intensive trajectories that train sequential execution. We argue that a single write decision can itself be difficult when the agent must gather and compare substantial read-tool evidence before its arguments become identifiable, a challenge that write-intensive data alone cannot address.

Guided by this insight, we propose WRIT (\uline{W}rite-\uline{R}ead \uline{I}ntensive \uline{T}rajectory Synthesis), a pipeline for synthesizing multi-turn agent training trajectories along two complexity axes: the number of write decisions in a task and the evidence burden of each individual decision. WRIT first generates write-intensive and read-heavy tasks.

It then diversifies user behavior instructions to reflect realistic conversational variation, and finally simulates agent-user interactions in an executable environment to produce complete training trajectories. The resulting data trains agents not only for longer task execution, but also for robust, evidence-grounded decision making under high information load. With only 2K synthesized trajectories, a 4B model trained on WRIT outperforms GPT-5.

1 no-think on $\tau^2$-bench and substantially reduces inference-time token usage, showing that compact SFT data can convert part of expensive test-time reasoning into efficient agent behavior.

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.CL

See more →

arXiv cs.CL·Miguel Arana-Catania, Catherine Conisbee, Matthew Kidd

5d ago

FeaturedOriginal

Letting the Data Speak: Extracting Keywords from Crowdsourced Collections with AI

AI Summary

The study evaluates three NLP approaches—Named Entity Recognition, Keyword Extraction, and Topic Modelling—using the Their Finest Hour Online Archive to automate keyword extraction from crowdsourced WWII collections. Findings suggest that while NLP methods show promise, no single approach is sufficient, and ethical considerations in automated keyword extraction are crucial for responsible stewardship.

#AI Coding #Inference #Open Source #Policy

WRIT: Write-Read Intensive Trajectory Synthesis for Multi-Turn User-Facing Agents

Quick Answer

Quick Take

Key Points

Paper Resources

Article Content

Want this in your inbox every morning?

More from arXiv cs.CL

Letting the Data Speak: Extracting Keywords from Crowdsourced Collections with AI

Quantifying Prior Dominance in Systems

Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?

Quick Answer

Quick Take

Key Points

Paper Resources

Article Content

Want this in your inbox every morning?

More from arXiv cs.CL

Letting the Data Speak: Extracting Keywords from Crowdsourced Collections with AI

Quantifying Prior Dominance in RAG Systems

Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?

Quantifying Prior Dominance in Systems