WRIT mascot

WRIT: Write-Read Intensive Trajectory Synthesis for Multi-Turn User-Facing Agents

1North Carolina State University    2Case Western Reserve University

Teaching agents not just to act more, but to know more before they act.

Highlights
2 axes

WRIT controls trajectory complexity along two axes: the number of write decisions in a task, and the evidence burden of each single decision — the underexplored read-heavy axis.

2K → 4B

With only 2K synthesized trajectories, a 4B model trained on WRIT surpasses GPT-5.1 no-think on τ2-bench.

Hard ↑

Gains are largest on read-heavy hard subsets — exactly the tasks where an agent must gather and compare evidence before committing to an action.

Motivation

The same write action, two very different difficulties

Most synthesis pipelines make tasks harder by composing more write actions. But a single write decision can already be hard — when the agent must gather and compare substantial read-tool evidence before its arguments become identifiable.

read tool write tool
Simple task 2 reads
"I need to book a one-way business class flight from Newark to Houston on May 25. Please book the direct flight that departs at 8:00 AM and arrives at 11:30 AM."
get_user_details
search_direct_flight
book_reservation
Read-heavy task 9 reads
"I need to book a one-way business class flight from the New York area to Houston. I'm flexible between May 25 and May 26, and I can depart from either Newark or LaGuardia. Please book the fastest overall flight."
get_user_details
search_direct_flight
search_onestop_flight
book_reservation

Both tasks share the same gold write action. The difference is what the agent must do before writing: the read-tool count rises from 2 to 9. This motivates a new data-synthesis question —

Beyond teaching agents to act for longer, can we synthesize trajectories that teach them to read more carefully before they act?
The WRIT Pipeline

Synthesizing write- and read-intensive trajectories in three stages

WRIT pipeline overview
Overview of the WRIT pipeline.
STAGE 1

Write-Read Intensive Tasks

Synthesize tasks with verifiable outcomes — both write-intensive requests with multiple sequential actions and read-heavy requests where one action demands extensive evidence gathering.

STAGE 2

User Behavior Diversification

Vary how users express and reveal the same request, so training data reflects realistic conversational behavior rather than only cooperative, fully-specified interactions.

STAGE 3

Simulation & Filtering

Run agent and user simulator through each task in an executable environment, keeping only correct and complete interactions as supervised fine-tuning trajectories.

Results

WRIT improves multi-turn agents across model families

On τ2-bench, under a controlled 2K-trajectory budget, WRIT consistently outperforms the strongest prior synthesis method on every tested base model.

Base model Method Retail Airline Average
Qwen3-4B-Instruct AReaL (best baseline) 59.4347.0055.64
WRIT 71.0561.0067.99 +12.4
Llama-3.1-8B-Instruct CoVe (best baseline) 52.1932.0046.04
WRIT 54.6150.0053.20 +7.2
Qwen2.5-14B-Instruct AReaL (best baseline) 57.6843.0053.20
WRIT 72.3757.5067.84 +14.6

Pass1 (%) on τ2-bench. Average is task-count weighted across Retail and Airline. Full tables, Pass4, hard subsets, and ablations are in the paper.

Pass^k reliability curves
Passk curves for Qwen3-4B-Instruct-2507 across full and read-heavy subsets.
A 4B model vs. GPT-5.1 on τ2-bench
GPT-5.1 thinking
79.27
Avg. Pass¹
1.52M output tokens · $17.52
GPT-5.1 no-think
62.80
Avg. Pass¹
318K output tokens · $5.56
WRIT-4B
67.99
Avg. Pass¹
251K output tokens

A 4B model trained on just 2K WRIT trajectories scores 67.99 — ahead of GPT-5.1 no-think (62.80) while emitting fewer output tokens at inference time.

Citation

BibTeX

If you find WRIT useful, please consider citing:
@misc{gu2026writwritereadintensivetrajectory, title={WRIT: Write-Read Intensive Trajectory Synthesis for Multi-Turn User-Facing Agents}, author={Hengrui Gu and Xiaotian Han and Kaixiong Zhou}, year={2026}, eprint={2606.02908}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2606.02908}, }