Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: This research explored the hybridization of CNN and ViT within a training
dataset of limited size, and introduced a distinct class imbalance. The
training was made from scratch with a mere focus on theoretically and
experimentally exploring the architectural strengths of the proposed hybrid
model. Experiments were conducted across varied data fractions with balanced
and imbalanced training datasets. Comparatively, the hybrid model,
complementing the strengths of CNN and ViT, achieved the highest recall of
0.9443 (50% data fraction in balanced) and consistency in F1 score around 0.85,
suggesting reliability in diagnosis. Additionally, the model was successful in
outperforming CNN and ViT in imbalanced datasets. Despite its complex
architecture, it required comparable training time to the transformers in all
data fractions.