Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: The comparison between discriminative and generative classifiers has
intrigued researchers since Efron's seminal analysis of logistic regression
versus discriminant analysis. While early theoretical work established that
generative classifiers exhibit lower sample complexity but higher asymptotic
error in simple linear settings, these trade-offs remain unexplored in the
transformer era. We present the first comprehensive evaluation of modern
generative and discriminative architectures - Auto-regressive modeling, Masked
Language Modeling, Discrete Diffusion, and Encoders for text classification.
Our study reveals that the classical 'two regimes' phenomenon manifests
distinctly across different architectures and training paradigms. Beyond
accuracy, we analyze sample efficiency, calibration, noise robustness, and
ordinality across diverse scenarios. Our findings offer practical guidance for
selecting the most suitable modeling approach based on real-world constraints
such as latency and data limitations.
Authors (10)
Siva Rajesh Kasa
Karan Gupta
Sumegh Roychowdhury
Ashutosh Kumar
Yaswanth Biruduraju
Santhosh Kumar Kasa
+4 more
Key Contributions
Provides the first comprehensive evaluation of modern generative and discriminative transformer architectures for text classification, analyzing trade-offs in accuracy, sample efficiency, calibration, and robustness. It reveals how classical 'two regimes' phenomena manifest in the transformer era and offers practical guidance for model selection.
Business Value
Helps organizations choose the most effective transformer-based models for text classification tasks, optimizing for factors like data availability, robustness requirements, and desired output properties, leading to better performance and efficiency.