When Evidence Conflicts: Uncertainty and Order Effects in Retrieval-Augmented Biomedical Question Answering
Quick Take
Study reveals LLMs struggle with conflicting biomedical evidence, impacting accuracy and suggesting new evaluation methods.
Key Points
- Evaluated six LLMs under various evidence conditions.
- Accuracy drops significantly with conflicting evidence order.
- Conflict-aware abstention score improves selective accuracy.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CL
See more →Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?
The reliability of LLM judges for evaluating deep research agents is critically assessed using the REFLECT benchmark.