Comparing LLM and Fine-Tuned Model Performance on NVDRS Circumstance Extraction with Varying Prompt Complexity
Quick Take
LLMs outperform fine-tuned models in extracting complex circumstances from NVDRS data.
Key Points
- Developed a 'Complexity Score' algorithm for prompt selection.
- LLMs excel in low-prevalence, inferentially complex circumstances.
- Framework generalizes across multiple frontier LLMs.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CL
See more →Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?
The reliability of LLM judges for evaluating deep research agents is critically assessed using the REFLECT benchmark.