Scaling Generative Foundation Models for Chest Radiography with Rectified Flow Transformers
Quick Answer
This paper shows that A new generative foundation model for chest radiograph synthesis, with over 1.3B parameters, has been developed to enhance clinical utility and generalization across diverse patient demographics.
Quick Take
A new generative foundation model for chest radiograph synthesis, with over 1.3B parameters, has been developed to enhance clinical utility and generalization across diverse patient demographics. Trained on 1.2M radiographs and 1.6T tokens, it achieves high-fidelity image synthesis indistinguishable from real radiographs, significantly advancing the state of the art.
Key Points
- Model trained from scratch with over 1.3 billion parameters.
- Utilizes a dataset of 1.2 million radiographs and expert metadata.
- Supports controllable generation across various demographics and pathologies.
- Achieves state-of-the-art fidelity in radiograph synthesis.
- Indistinguishable images from real radiographs to clinical experts.
Paper Resources
Article Excerpt
From source RSS / original summaryarXiv:2606. 19460v1 Announce Type: new Abstract: We introduce the first generative foundation model for chest radiograph synthesis trained from scratch at the billion-parameter scale. Existing radiographic AI models often suffer from poor generalisation across patient subpopulations, institutions, and acquisition settings, resulting in limited real-world clinical utility.
Controlled, high-fidelity synthesis of chest radiographs is a promising path toward diversifying clinical datasets and evaluating the robustness of diagnostic models. Therefore, we present the largest specialist generative foundation model for chest radiographs to date, with over 1. 3B parameters, trained for 1. 6T tokens on a curated, heterogeneous dataset comprising 1. 2M radiographs and clinical expert-guided metadata.
Our model supports controllable radiograph generation and editing across multiple demographic subgroups, acquisition views, and a dozen pathologies. Moreover, we significantly advance the state of the art in radiograph synthesis fidelity, producing images that are indistinguishable from real radiographs to clinical experts.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CV
See more →LLM-Guided ANN Index Optimization for Human-Object Interaction Retrieval
A phase-aware LLM agent optimizes human-object interaction retrieval, outperforming Optuna TPE by 33.3% and VDTuner by 34.2% on the HICO-DET benchmark. This method enhances throughput by 15.3x over UniIR and demonstrates strong transferability across vector database management systems.