CAX-Agent: A Lightweight Agent Harness for Reliable APDL Automation

arXiv cs.AI·Chenying Lin, Yichen Hai, Yi He, Ran Wang, Haiyan Qiang, Liang Yu

5/18/2026

·~2 min·5/18/2026·en·1

Quick Answer

CAX-Agent is a lightweight agent harness designed for reliable MAPDL automation, addressing execution control and fault recovery challenges.

Quick Take

CAX-Agent is a lightweight agent harness designed for reliable MAPDL automation, addressing execution control and fault recovery challenges. Empirical evaluation shows that the 'model_only' recovery strategy outperforms others with a completion rate of 92.67% and a task score of 3.59/4 across 450 case runs.

Key Points

CAX-Agent organizes execution into three layers: LLM service, agent harness, and solver backend.
The 'model_only' strategy achieved the highest completion rate of 92.67% among three recovery strategies.
Inter-rater agreement for task completion was strong, with a Cohen's kappa of 0.84.
The study evaluated 50 standard structural benchmarks with 450 total case runs.
Findings suggest potential for broader validation of recovery strategies in complex scenarios.

Paper Resources

Read Paperarxiv.org View PDFarxiv.org

📖 Reader Mode

~2 min read

[Submitted on 12 May 2026]

View PDF HTML (experimental)

Abstract:Large language models deployed for MAPDL finite-element simulation face practical reliability challenges: without structured execution control, tool encapsulation, and fault recovery, outputs may be inconsistent and task failures are common. The Agent Harness paradigm addresses this by inserting domain-specific orchestration middleware that manages tool lifecycles, workflow state, and recovery escalation. This paper presents the architecture of CAX-Agent, a lightweight agent harness purpose-built for MAPDL automation, and empirically evaluates one of its core components -- the recovery this http URL-Agent organizes execution into three layers -- LLM service, agent harness, and solver backend -- with a recovery ladder that escalates from deterministic rule patching through model-driven regeneration to context enrichment and human intervention. We evaluate three recovery strategies (no_recovery, rule_only, and model_only) on 50 standard structural benchmarks with three repeated runs per strategy (450 case-runs total). Two independent human raters score task completion under blind conditions; inter-rater agreement is strong (quadratic weighted Cohen's kappa = 0.84, 96 percent of score pairs within one point). Model_only achieves the best completion rate (0.9267), task score (3.59/4), total score (9.16/10), and zero-intervention rate (0.84), outperforming rule_only (0.7733, 3.17/4, 7.03/10, 0.00) and no_recovery (0.6933, 2.74/4, 5.60/10, 0.00) with large effect sizes (Cliff's delta = 0.81-0.87). The benchmark uses deliberately simple geometries to isolate recovery-policy effects; we discuss the scope of these findings and directions for broader validation.

Comments:	8 pages, 6 figures, IEEE conference format
Subjects:	Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE)
Cite as:	arXiv:2605.15218 [cs.AI]
	(or arXiv:2605.15218v1 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.2605.15218 arXiv-issued DOI via DataCite

Submission history

From: Yichen Hai [view email]
[v1] Tue, 12 May 2026 14:46:34 UTC (1,699 KB)

— Originally published at arxiv.org

Continue reading on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.AI

See more →

arXiv cs.AI·Ye Liu, Srijan Bansal, Bo Pang, Yang Li, Zeyu Leo Liu, Yifei Ming, Zixuan Ke, Shafiq Joty, Semih Yavuz

1d ago

FeaturedOriginal

Procedural Memory Distillation: Online Reflection for Self-Improving Language Models

AI Summary

Procedural Memory Distillation (PMD) enhances reinforcement learning by converting cross-episode signals into reusable memory, improving Qwen3-8B and OLMo3-Instruct-7B models by 3.8-5.5% on SCIKNOWEVAL and 7.9-13.6% on . The co-evolution of policy and memory allows for more effective self-supervision, demonstrating significant performance gains when both components are active.

#LLM #AI Coding #Inference #Policy