SimGym: A Framework for A/B Test Simulation in E-Commerce with Traffic-Grounded VLM Agents
Quick Take
SimGym enables rapid A/B test simulations in e-commerce using vision-language model agents.
Key Points
- Reduces A/B testing time from weeks to under an hour.
- Utilizes traffic-grounded persona generation for buyer archetypes.
- Achieves 77% alignment with real buyer behavior outcomes.
📖 Reader Mode
~2 min readAuthors:Han Li, Vibhor Malik, Zahra Zanjani Foumani, Alberto Castelo, Shuang Xie, Ailin Fan, Keat Yang Koay, Yuanzheng Zhu, Meysam Feghhi, Ronie Uliana, Zhaoyu Zhang, Angelo Ocana Martins, Mingyu Zhao, Francis Pelland, Jonathan Faerman, Nikolas LeBlanc, Aaron Glazer, Andrew McNamara, Zhong Wu, Lingyun Wang
Abstract:A/B testing remains the gold standard for evaluating modifications to e-commerce storefronts, yet it diverts traffic, requires weeks to reach statistical significance, and risks degrading user experience. We present SimGym, a framework for simulating A/B tests on e-commerce storefronts using vision-language model (VLM) agents operating in a live browser. The framework comprises three key components: (a) a traffic-grounded persona generation pipeline that derives per-shop buyer archetypes and intents from production clickstream data; (b) a live-browser agent architecture that combines multimodal perception over visual and browser-structured observations with episodic memory and guardrails to conduct coherent shopping sessions across control and treatment storefronts; and (c) an evaluation protocol that compares simulated outcome shifts with observed shifts in real buyer behavior. We validate SimGym on A/B tests of visually driven UI theme changes from a major e-commerce platform across diverse storefronts and product categories. Empirical results show that SimGym agents achieve strong agreement with observed outcome shifts, attaining 77% directional alignment with add-to-cart shifts observed across interface variants in real-buyer traffic. It reduces experimental cycles from weeks to under an hour, enabling rapid experimentation without exposing real buyers to candidate variants.
| Subjects: | Artificial Intelligence (cs.AI) |
| Cite as: | arXiv:2605.19219 [cs.AI] |
| (or arXiv:2605.19219v1 [cs.AI] for this version) | |
| https://doi.org/10.48550/arXiv.2605.19219 arXiv-issued DOI via DataCite (pending registration) |
Submission history
From: Zahra Zanjani Foumani [view email]
[v1]
Tue, 19 May 2026 00:46:41 UTC (6,393 KB)
— Originally published at arxiv.org
More from arXiv cs.AI
See more →From Prompts to Protocols: An AI Agent for Laboratory Automation
An AI agent integrates large language models for automating laboratory protocols, enhancing efficiency and accuracy.