Agentic Trading: When LLM Agents Meet Financial Markets

arXiv cs.AI·Yihan Xia, Panpan You, Taotao Wang, Fang Liu, Han Qi, Xiaoxiao Wu, Shengli Zhang

5/20/2026

·~2 min·5/20/2026·en·23

Quick Answer

Quick Take

This paper evaluates the integration of Large Language Models (LLMs) in trading systems, highlighting that only 2 out of 19 studies provide reproducible protocols, indicating significant gaps in evaluation methods and execution semantics. The research emphasizes the need for better comparability and reproducibility in LLM-based trading agents.

Key Points

77 studies reviewed, focusing on LLM agents in trading systems.
Only 2 of 19 primary studies report reproducible protocols.
11 studies document execution timing or semantics.
Architecture-Capability-Adaptation framework proposed for analysis.
Field faces bottlenecks in comparable evaluation and reproducibility.

Paper Resources

Read Paperarxiv.org View PDFarxiv.org

📖 Reader Mode

~2 min read

[Submitted on 19 May 2026]

View PDF HTML (experimental)

Abstract:A growing body of work explores how Large Language Models (LLMs) can be embedded in trading systems as agents that perceive market information, retrieve context, reason about decisions, emit tradable actions, and adapt under market feedback. This paper reframes LLM-based trading agents as expert-system decision pipelines and presents an audit-oriented evidence map of 77 included studies in a protocol-coded snapshot screened through 2026-03-09. A primary empirical subset (n=19) satisfies the minimum boundary of Action Output plus Closed-Loop Evaluation; the remaining 58 included studies are retained as background and design context. The central empirical finding is protocol incomparability: within the primary subset, only 2/19 studies report extractable time-consistent split protocols, 1/19 reports an explicit transaction-cost model, 1/19 documents universe or survivorship handling, 11/19 report execution timing or semantics, 15/19 are coded as R0, and no study reaches R3 reproducibility. We therefore use Architecture-Capability-Adaptation as a working analytical lens rather than a validated taxonomy, and we foreground the evidence ledger, reproducibility audit, and reporting checklist as the main contributions. The resulting survey shows that architectural experimentation is expanding rapidly, while comparable evaluation protocols, execution semantics, and reproducible artifacts remain the field's immediate bottlenecks.

Comments:	59 pages, 15 figures, 27 tables
Subjects:	Artificial Intelligence (cs.AI)
Cite as:	arXiv:2605.19337 [cs.AI]
	(or arXiv:2605.19337v1 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.2605.19337 arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Wang Taotao [view email]
[v1] Tue, 19 May 2026 04:20:07 UTC (12,456 KB)

— Originally published at arxiv.org

Continue reading on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.AI

See more →

arXiv cs.AI·Ye Liu, Srijan Bansal, Bo Pang, Yang Li, Zeyu Leo Liu, Yifei Ming, Zixuan Ke, Shafiq Joty, Semih Yavuz

1d ago

FeaturedOriginal

Procedural Memory Distillation: Online Reflection for Self-Improving Language Models

AI Summary

Procedural Memory Distillation (PMD) enhances reinforcement learning by converting cross-episode signals into reusable memory, improving Qwen3-8B and OLMo3-Instruct-7B models by 3.8-5.5% on SCIKNOWEVAL and 7.9-13.6% on . The co-evolution of policy and memory allows for more effective self-supervision, demonstrating significant performance gains when both components are active.

#LLM #AI Coding #Inference #Policy