Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Large Language Models (LLMs) exhibit a troubling duality, capable of both
remarkable generalization and brittle, verbatim memorization of their training
data. This unpredictability undermines their reliability in high-stakes
applications. In this work, we propose a unified framework to understand,
identify, and control these distinct reasoning modes. First, we introduce a
theoretical model based on the Information Bottleneck (IB) principle,
formalizing generalization as the learning of a compressed, task-relevant
representation and memorization as a failure to compress. Building on this
theory, we develop Dynamic Mode Steering (DMS), a novel inference-time
algorithm which comprises two components: (1) a lightweight, causally-grounded
linear probe that identifies the model's instantaneous reliance on
memorization, and (2) a dynamic activation steering mechanism that nudges the
model's computation towards pre-identified generalization circuits. We frame
DMS as a form of adaptive, self-contrastive decoding. Experiments on reasoning
and faithfulness tasks demonstrate that DMS significantly improves logical
consistency and factual accuracy, thereby offering a principled approach to
enhancing LLM reliability.
Submitted
October 25, 2025
Key Contributions
This paper proposes a unified theoretical framework based on the Information Bottleneck principle to understand generalization and memorization in LLMs. It introduces Dynamic Mode Steering (DMS), a novel inference-time algorithm that identifies memorization reliance and steers computation towards generalization circuits, enhancing LLM reliability.
Business Value
Increases the trustworthiness and safety of LLMs for critical applications like healthcare, finance, and legal analysis by reducing unpredictable memorization.