Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: We show that across architecture (Transformer vs. Mamba vs. RWKV), training
dataset (OpenWebText vs. The Pile), and scale (14 million parameters to 12
billion parameters), autoregressive language models exhibit highly consistent
patterns of change in their behavior over the course of pretraining. Based on
our analysis of over 1,400 language model checkpoints on over 110,000 tokens of
English, we find that up to 98% of the variance in language model behavior at
the word level can be explained by three simple heuristics: the unigram
probability (frequency) of a given word, the $n$-gram probability of the word,
and the semantic similarity between the word and its context. Furthermore, we
see consistent behavioral phases in all language models, with their predicted
probabilities for words overfitting to those words' $n$-gram probabilities for
increasing $n$ over the course of training. Taken together, these results
suggest that learning in neural language models may follow a similar trajectory
irrespective of model details.
Authors (3)
James A. Michaelov
Roger P. Levy
Benjamin K. Bergen
Submitted
October 28, 2025
Key Contributions
This study reveals that autoregressive language models exhibit highly consistent behavioral phases during pretraining, irrespective of architecture (Transformer, Mamba, RWKV), training data, or scale. Up to 98% of word-level behavior variance can be explained by unigram probability, n-gram probability, and semantic similarity, suggesting fundamental, scale-invariant learning principles.
Business Value
Provides fundamental insights into how LLMs learn, enabling more efficient training strategies and better model design. Understanding these consistent patterns can lead to more predictable and reliable AI development.