Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_ml 95% Match Research Paper Radiologists,Medical Researchers,AI Developers in Healthcare,Clinical Decision Support System Developers 1 week ago

Breast Cancer VLMs: Clinically Practical Vision-Language Train-Inference Models

computer-vision › medical-imaging
📄 Abstract

Abstract: Breast cancer remains the most commonly diagnosed malignancy among women in the developed world. Early detection through mammography screening plays a pivotal role in reducing mortality rates. While computer-aided diagnosis (CAD) systems have shown promise in assisting radiologists, existing approaches face critical limitations in clinical deployment - particularly in handling the nuanced interpretation of multi-modal data and feasibility due to the requirement of prior clinical history. This study introduces a novel framework that synergistically combines visual features from 2D mammograms with structured textual descriptors derived from easily accessible clinical metadata and synthesized radiological reports through innovative tokenization modules. Our proposed methods in this study demonstrate that strategic integration of convolutional neural networks (ConvNets) with language representations achieves superior performance to vision transformer-based models while handling high-resolution images and enabling practical deployment across diverse populations. By evaluating it on multi-national cohort screening mammograms, our multi-modal approach achieves superior performance in cancer detection and calcification identification compared to unimodal baselines, with particular improvements. The proposed method establishes a new paradigm for developing clinically viable VLM-based CAD systems that effectively leverage imaging data and contextual patient information through effective fusion mechanisms.
Authors (4)
Shunjie-Fabian Zheng
Hyeonjun Lee
Thijs Kooi
Ali Diba
Submitted
October 29, 2025
arXiv Category
cs.CV
arXiv PDF

Key Contributions

This study introduces a novel framework for breast cancer detection by synergistically combining visual features from mammograms with structured textual descriptors from clinical metadata and radiological reports. The proposed method strategically integrates ConvNets with language representations, demonstrating superior performance over vision transformer-based models and addressing limitations in multi-modal data interpretation and clinical feasibility.

Business Value

This research has the potential to significantly improve early breast cancer detection rates, leading to better patient outcomes and reduced healthcare costs. It can enhance the accuracy and efficiency of radiological assessments, aiding clinicians in making more informed decisions.