Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_ai 70% Match Research Theoretical machine learning researchers,Deep learning theorists,Researchers focused on model interpretability and understanding 2 weeks ago

When Does Closeness in Distribution Imply Representational Similarity? An Identifiability Perspective

large-language-models › alignment
📄 Abstract

Abstract: When and why representations learned by different deep neural networks are similar is an active research topic. We choose to address these questions from the perspective of identifiability theory, which suggests that a measure of representational similarity should be invariant to transformations that leave the model distribution unchanged. Focusing on a model family which includes several popular pre-training approaches, e.g., autoregressive language models, we explore when models which generate distributions that are close have similar representations. We prove that a small Kullback--Leibler divergence between the model distributions does not guarantee that the corresponding representations are similar. This has the important corollary that models with near-maximum data likelihood can still learn dissimilar representations -- a phenomenon mirrored in our experiments with models trained on CIFAR-10. We then define a distributional distance for which closeness implies representational similarity, and in synthetic experiments, we find that wider networks learn distributions which are closer with respect to our distance and have more similar representations. Our results thus clarify the link between closeness in distribution and representational similarity.
Authors (4)
Beatrix M. G. Nielsen
Emanuele Marconato
Andrea Dittadi
Luigi Gresele
Submitted
June 4, 2025
arXiv Category
cs.LG
arXiv PDF

Key Contributions

Investigates the relationship between distributional closeness (e.g., KL divergence) and representational similarity in deep neural networks from an identifiability perspective. It proves that small KL divergence between model distributions does not guarantee similar representations, implying models with near-maximum data likelihood can still learn dissimilar representations, and defines a distributional distance for which closeness implies similarity.

Business Value

Provides fundamental theoretical insights into deep learning, which can guide the development of more robust, interpretable, and reliable AI models, potentially improving generalization and reducing unexpected behaviors.