Transition-Aware best-of-N sampling for Longitudinal Chest X-ray Reports
Quick Answer
The study introduces a training-free, transition-aware best-of-N sampling method for chest X-ray report generation, outperforming random selection, especially in the Impression section.
Quick Take
The study introduces a training-free, transition-aware best-of-N sampling method for chest X-ray report generation, outperforming random selection, especially in the Impression section. Utilizing four directional set distances, it enhances the accuracy of report generation by leveraging longitudinal patient data across multiple visits.
Key Points
- Introduces transition-aware best-of-N sampling for chest X-ray reports.
- Outperforms random selection, especially in the Impression section.
- Utilizes four directional set distances for improved accuracy.
- Framework evaluated on a multi-visit AP-PA cohort.
- No training required for the report generation process.
Paper Resources
📖 Reader Mode
~2 min readAbstract:In longitudinal clinical practice, every chest X-ray is read in the context of the patients prior exam, and much of what the radiologist communicates is the change from one visit to the next. To the best of our knowledge, we present the first training-free best-of-N sampling scheme for pre-trained chest X-ray report generators that is explicitly aware of this longitudinal prior to current transition. We call it transition-aware best-of-N sampling, each report is split into sentences and embedded into an unordered set in Rd; each (prior, current) pair is reduced to a fixed-dim directional vector via a set-to-set distance designed to encode the change between the two sets; and candidates are scored by cosine distance from their candidate transition vector to a cached bank of ground-truth training transition vectors, aggregated as min or kNN. We instantiate the framework with four directional set distances (mean-shift, novelty residual, directed-Hausdorff anchor, and cost-weighted optimal transport) and evaluate on a multi-visit AP-PA cohort, running inference under three prompts on three vision-language generators. Transition-aware best-of-N outperforms random selection across the board, with the largest relative gains on the Impression section.
| Subjects: | Computer Vision and Pattern Recognition (cs.CV) |
| Cite as: | arXiv:2606.28393 [cs.CV] |
| (or arXiv:2606.28393v1 [cs.CV] for this version) | |
| https://doi.org/10.48550/arXiv.2606.28393 arXiv-issued DOI via DataCite |
Submission history
From: Halil Ibrahim Gulluk [view email]
[v1]
Tue, 23 Jun 2026 23:11:59 UTC (1,128 KB)
— Originally published at arxiv.org
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CV
See more →LLM-Guided ANN Index Optimization for Human-Object Interaction Retrieval
A phase-aware LLM agent optimizes human-object interaction retrieval, outperforming Optuna TPE by 33.3% and VDTuner by 34.2% on the HICO-DET benchmark. This method enhances throughput by 15.3x over UniIR and demonstrates strong transferability across vector database management systems.