Deliberate Evolution: Agentic Reasoning for Sample-Efficient Symbolic Regression with LLMs
Quick Take
The Deliberate Evolution (DE) framework enhances sample efficiency in symbolic regression by decoupling proposal generation from search control, outperforming LLM-based baselines on LLM-SRBench with only 40% of the sample budget. This method utilizes adaptive operators, analytical tools, and reflective memory to improve performance across various scientific domains.
Key Points
- DE framework improves sample efficiency in symbolic regression by 60% using only 40% of standard samples.
- Decouples symbolic generation from search guidance, enhancing LLM performance.
- Utilizes adaptive operators and analytical tools for better error diagnosis.
- Demonstrated superior results across diverse scientific domains on LLM-SRBench.
- Addresses limitations of existing LLM-based evolutionary methods reliant on scalar feedback.
Article Excerpt
From source RSS / original summaryarXiv:2606. 04360v1 Announce Type: new Abstract: Symbolic regression (SR) discovers compact mathematical expressions from data, yet recent LLM-based evolutionary methods remain sample-inefficient because they rely mainly on scalar feedback such as MSE. We identify a core limitation: existing methods conflate candidate proposal with search guidance, requiring the LLM to infer how to evolve an expression, diagnose its errors, and reuse past experience from a single score.
To address this, we propose Deliberate Evolution (DE), an agentic framework that decouples symbolic generation from search control. DE guides LLM proposals with adaptive operators for search direction, analytical tools for structural diagnosis, and reflective memory for trajectory-level experience. Experiments on LLM-SRBench show that DE consistently outperforms representative LLM-based SR baselines across diverse scientific domains while using only 40% of the standard sample budget.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CL
See more →Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?
The REFLECT benchmark reveals that current LLM judges are unreliable, achieving below 55% accuracy in evaluating reasoning and evidence use, highlighting the need for improved evaluation methods for deep research agents.