Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: The scarcity of well-annotated diverse medical images is a major hurdle for
developing reliable AI models in healthcare. Substantial technical advances
have been made in generative foundation models for natural images. Here we
develop `ChexGen', a generative vision-language foundation model that
introduces a unified framework for text-, mask-, and bounding box-guided
synthesis of chest radiographs. Built upon the latent diffusion transformer
architecture, ChexGen was pretrained on the largest curated chest X-ray dataset
to date, consisting of 960,000 radiograph-report pairs. ChexGen achieves
accurate synthesis of radiographs through expert evaluations and quantitative
metrics. We demonstrate the utility of ChexGen for training data augmentation
and supervised pretraining, which led to performance improvements across
disease classification, detection, and segmentation tasks using a small
fraction of training data. Further, our model enables the creation of diverse
patient cohorts that enhance model fairness by detecting and mitigating
demographic biases. Our study supports the transformative role of generative
foundation models in building more accurate, data-efficient, and equitable
medical AI systems.