Agentic Trading: When LLM Agents Meet Financial Markets
Quick Answer
This paper evaluates the integration of Large Language Models (LLMs) in trading systems, highlighting that only 2 out of 19 studies provide reproducible protocols, indicating significant gaps in evaluation methods and execution semantics.
Quick Take
This paper evaluates the integration of Large Language Models (LLMs) in trading systems, highlighting that only 2 out of 19 studies provide reproducible protocols, indicating significant gaps in evaluation methods and execution semantics. The research emphasizes the need for better comparability and reproducibility in LLM-based trading agents.
Key Points
- 77 studies reviewed, focusing on LLM agents in trading systems.
- Only 2 of 19 primary studies report reproducible protocols.
- 11 studies document execution timing or semantics.
- Architecture-Capability-Adaptation framework proposed for analysis.
- Field faces bottlenecks in comparable evaluation and reproducibility.
Paper Resources
📖 Reader Mode
~2 min readAbstract:A growing body of work explores how Large Language Models (LLMs) can be embedded in trading systems as agents that perceive market information, retrieve context, reason about decisions, emit tradable actions, and adapt under market feedback. This paper reframes LLM-based trading agents as expert-system decision pipelines and presents an audit-oriented evidence map of 77 included studies in a protocol-coded snapshot screened through 2026-03-09. A primary empirical subset (n=19) satisfies the minimum boundary of Action Output plus Closed-Loop Evaluation; the remaining 58 included studies are retained as background and design context. The central empirical finding is protocol incomparability: within the primary subset, only 2/19 studies report extractable time-consistent split protocols, 1/19 reports an explicit transaction-cost model, 1/19 documents universe or survivorship handling, 11/19 report execution timing or semantics, 15/19 are coded as R0, and no study reaches R3 reproducibility. We therefore use Architecture-Capability-Adaptation as a working analytical lens rather than a validated taxonomy, and we foreground the evidence ledger, reproducibility audit, and reporting checklist as the main contributions. The resulting survey shows that architectural experimentation is expanding rapidly, while comparable evaluation protocols, execution semantics, and reproducible artifacts remain the field's immediate bottlenecks.
| Comments: | 59 pages, 15 figures, 27 tables |
| Subjects: | Artificial Intelligence (cs.AI) |
| Cite as: | arXiv:2605.19337 [cs.AI] |
| (or arXiv:2605.19337v1 [cs.AI] for this version) | |
| https://doi.org/10.48550/arXiv.2605.19337 arXiv-issued DOI via DataCite (pending registration) |
Submission history
From: Wang Taotao [view email]
[v1]
Tue, 19 May 2026 04:20:07 UTC (12,456 KB)
— Originally published at arxiv.org
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.AI
See more →Procedural Memory Distillation: Online Reflection for Self-Improving Language Models
Procedural Memory Distillation (PMD) enhances reinforcement learning by converting cross-episode signals into reusable memory, improving Qwen3-8B and OLMo3-Instruct-7B models by 3.8-5.5% on SCIKNOWEVAL and 7.9-13.6% on . The co-evolution of policy and memory allows for more effective self-supervision, demonstrating significant performance gains when both components are active.