Microsoft Research Releases Webwright: A Terminal-Native Web Agent Framework That Scores 60.1% on Odysseys, Up from Base GPT-5.4’s 33.5%
Quick Take
Microsoft Research's Webwright framework achieves 60.1% on Odysseys, improving from GPT-5.4's 33.5%.
Key Points
- Webwright replaces click-trace automation with Playwright scripts.
- Utilizes a single agent loop across three modules.
- Achieves highest AutoEval score among open-sourced recipes.
Article Excerpt
From source RSS / original summaryMicrosoft Research introduces Webwright, a terminal-native browser agent framework that replaces click-trace web automation with reusable Playwright scripts. Using a single agent loop across three modules and roughly 1,000 lines of code, Webwright powered by GPT-5. 4 reaches 60. 1% on the long-horizon Odysseys benchmark and 86. 7% on Online-Mind2Web — the highest AutoEval score among open-sourced harness recipes. The post Microsoft Research Releases Webwright: A Terminal-Native Web Agent Framework That Scores 60.
1% on Odysseys, Up from Base GPT-5. 4’s 33. 5% appeared first on MarkTechPost.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from MarkTechPost
See more →
Build a Complete Langfuse Observability and Evaluation Pipeline for Tracing, Prompt Management, Scoring, and Experiments
This tutorial guides building a Langfuse pipeline for observability and evaluation without paid model access.
