Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Knowledge distillation (KD) has traditionally relied on a static
teacher-student framework, where a large, well-trained teacher transfers
knowledge to a single student model. However, these approaches often suffer
from knowledge degradation, inefficient supervision, and reliance on either a
very strong teacher model or large labeled datasets, which limits their
effectiveness in real-world, limited-data scenarios. To address these, we
present the first-ever Weakly-supervised Chain-based KD network (WeCKD) that
redefines knowledge transfer through a structured sequence of interconnected
models. Unlike conventional KD, it forms a progressive distillation chain,
where each model not only learns from its predecessor but also refines the
knowledge before passing it forward. This structured knowledge transfer further
enhances feature learning, reduces data dependency, and mitigates the
limitations of one-step KD. Each model in the distillation chain is trained on
only a fraction of the dataset and demonstrates that effective learning can be
achieved with minimal supervision. Extensive evaluations across four otoscopic
imaging datasets demonstrate that it not only matches but in many cases
surpasses the performance of existing supervised methods. Experimental results
on two other datasets further underscore its generalization across diverse
medical imaging modalities, including microscopic and magnetic resonance
imaging. Furthermore, our evaluations resulted in cumulative accuracy gains of
up to +23% over a single backbone trained on the same limited data, which
highlights its potential for real-world adoption.
Authors (6)
Md. Abdur Rahman
Mohaimenul Azam Khan Raiaan
Sami Azam
Asif Karim
Jemima Beissbarth
Amanda Leach
Submitted
October 16, 2025
Key Contributions
WeCKD is the first Weakly-supervised Chain-based KD network that redefines knowledge transfer through a structured sequence of interconnected models. Unlike conventional KD, it forms a progressive distillation chain where each model learns from its predecessor and refines knowledge before passing it forward, enhancing feature learning and reducing data dependency in limited-data scenarios.
Business Value
Enables the development of more accurate and efficient AI diagnostic tools in healthcare, especially in regions or for conditions with scarce labeled medical data.