Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
Introduces Quokka, the first systematic scaling law for diffusion language models (DLMs), covering both compute-constrained and data-constrained regimes. This work provides practical guidance for DLM training and long-term inspiration for the AI community by studying key modeling and optimization designs.
Provides a framework for more efficient and effective training of large language models, potentially reducing computational costs and accelerating development cycles for AI products.