Micro-Macro Retrieval: Reducing Long-Form Hallucination in Large Language Models
Quick Take
The Micro-Macro Retrieval (M2R) framework significantly reduces hallucination in long-form generation by ensuring key information proximity to outputs, utilizing a dual-level retrieval approach. Extensive experiments show M2R's effectiveness in lengthy-context tasks, trained with a reinforcement learning strategy for stable skill acquisition.
Key Points
- M2R employs a dual-level retrieval approach: macro for coarse evidence, micro for essential results.
- The framework addresses factual accuracy by keeping key information close to model outputs.
- Trained with reinforcement learning, M2R uses customized rule-based rewards for skill acquisition.
- Extensive benchmarks demonstrate M2R's effectiveness in reducing hallucination in long-form tasks.
Article Content
From source RSS / original summaryarXiv:2605. 28828v1 Announce Type: new Abstract: Large Language Models (LLMs) achieve impressive performance across many tasks but remain prone to hallucination, especially in long-form generation where redundant retrieved contexts and lengthy reasoning chains amplify factual errors. Recent studies highlight a critical phenomenon: the closer key information appears to the model outputs, the higher the factual accuracy.
However, existing retrieval-augmented language models (RALMs) lack effective mechanisms to ensure this proximity - external evidence is injected into reasoning via multi-turn retrieval, but this cannot ensure key information stays close to the outputs. We propose Micro-Macro Retrieval (M2R), a novel retrieve-while-generate framework to fill this gap.
At the macro level, M2R retrieves coarse-grained evidence from external sources; at the micro level, it extracts essential results from a key information repository built during reasoning and reuses them while generating answers. This design directly addresses the key-information-to-output proximity bottleneck, effectively reducing hallucination in long-form tasks.
M2R is trained with a curriculum learning-based reinforcement learning strategy using customized rule-based rewards, enabling stable acquisition of retrieval and grounding skills. Extensive experiments across different benchmarks demonstrate the effectiveness of M2R, especially in lengthy-context settings.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CL
See more →Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?
The reliability of LLM judges for evaluating deep research agents is critically assessed using the REFLECT benchmark.