arxiv_ml 95% Match Research Paper Radiologists,Medical Researchers,AI Developers in Healthcare,Clinical Decision Support System Developers 1 week ago

Breast Cancer VLMs: Clinically Practical Vision-Language Train-Inference Models

computer-vision › medical-imaging

📄 Abstract

Abstract: Breast cancer remains the most commonly diagnosed malignancy among women in the developed world. Early detection through mammography screening plays a pivotal role in reducing mortality rates. While computer-aided diagnosis (CAD) systems have shown promise in assisting radiologists, existing approaches face critical limitations in clinical deployment - particularly in handling the nuanced interpretation of multi-modal data and feasibility due to the requirement of prior clinical history. This study introduces a novel framework that synergistically combines visual features from 2D mammograms with structured textual descriptors derived from easily accessible clinical metadata and synthesized radiological reports through innovative tokenization modules. Our proposed methods in this study demonstrate that strategic integration of convolutional neural networks (ConvNets) with language representations achieves superior performance to vision transformer-based models while handling high-resolution images and enabling practical deployment across diverse populations. By evaluating it on multi-national cohort screening mammograms, our multi-modal approach achieves superior performance in cancer detection and calcification identification compared to unimodal baselines, with particular improvements. The proposed method establishes a new paradigm for developing clinically viable VLM-based CAD systems that effectively leverage imaging data and contextual patient information through effective fusion mechanisms.

Authors (4)

Shunjie-Fabian Zheng

Hyeonjun Lee

Thijs Kooi

Ali Diba

Submitted

October 29, 2025

arXiv Category

cs.CV

arXiv PDF

Key Contributions

This study introduces a novel framework for breast cancer detection by synergistically combining visual features from mammograms with structured textual descriptors from clinical metadata and radiological reports. The proposed method strategically integrates ConvNets with language representations, demonstrating superior performance over vision transformer-based models and addressing limitations in multi-modal data interpretation and clinical feasibility.

Business Value

This research has the potential to significantly improve early breast cancer detection rates, leading to better patient outcomes and reduced healthcare costs. It can enhance the accuracy and efficiency of radiological assessments, aiding clinicians in making more informed decisions.

Paper Metadata

Innovation Type

Novel Framework

Deployment Feasibility

High, as it aims to integrate with existing clinical workflows and utilizes readily accessible clinical metadata and synthesized reports, addressing feasibility concerns.

Limitations Addressed

Nuanced interpretation of multi-modal data,Feasibility due to prior clinical history requirement,Performance limitations of vision transformer-based models

Technical Tags

Vision-Language ModelsMammographyComputer-Aided DiagnosisConvolutional Neural NetworksTokenizationMulti-modal LearningClinical Data IntegrationRadiological Reports

Research Topics

Medical Imaging AnalysisComputer-Aided DiagnosisVision-Language IntegrationClinical Decision SupportBiomedical Informatics

Methods & Architectures

Convolutional Neural Networks (ConvNets)Tokenization ModulesVision Transformers ConvNetsVision Transformers

Applications & Tasks

Healthcare Medical Diagnostics Radiology Image ClassificationMulti-modal Data FusionClinical Data Interpretation Breast Cancer DetectionMammogram AnalysisRadiological Report Generation

Related Fields

Medical ImagingMachine LearningNatural Language ProcessingBiomedical InformaticsComputer Vision

Keywords

Breast CancerMammographyVision-Language ModelsCADConvNetsTransformersMulti-modalClinical DataRadiologyEarly DetectionHealthcare AITokenization

Academic Context

#Medical Imaging Analysis#Computer-Aided Diagnosis#Vision-Language Integration#Clinical Decision Support#Biomedical Informatics

Commercial Potential

Potential Products

AI-assisted mammography analysis softwareClinical decision support tools for oncologistsAutomated radiological report generation systems

Target Industries

HealthcareMedical DevicesPharmaceuticals

Use Case Examples

Assisting radiologists in identifying subtle signs of breast cancer in mammograms.Integrating patient history and imaging data for a comprehensive diagnostic assessment.Reducing false positives and negatives in cancer screening.

Competitive Edge

Aims to outperform existing CAD systems by effectively integrating multi-modal data and addressing clinical deployment limitations, particularly by leveraging ConvNets with language representations.

Market Opportunity

Large, given the global prevalence of breast cancer and the demand for improved diagnostic tools.

Revenue Models

Software licensingSaaS for healthcare providersPartnerships with medical imaging companies

Resource Requirements

Data Requirements

2D mammograms,Structured clinical metadata,Synthesized radiological reports

Deployment Constraints

Integration with existing PACS systems,Regulatory approval for medical devices,Data privacy and security (HIPAA compliance)

Regulatory Considerations

FDA approval for medical diagnostic devicesHIPAA compliance for patient data

Production Readiness

Maturity Level

Research

Time to Market

2-5 years

Patent Potential

Moderate to High, for the novel framework and tokenization modules.

View Full Paper Back to Papers