Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_cl 93% Match Research Paper AI Researchers,ML Engineers,AI Safety Specialists,Developers of High-stakes AI 1 week ago

Generalization or Memorization: Dynamic Decoding for Mode Steering

large-language-models › reasoning
📄 Abstract

Abstract: Large Language Models (LLMs) exhibit a troubling duality, capable of both remarkable generalization and brittle, verbatim memorization of their training data. This unpredictability undermines their reliability in high-stakes applications. In this work, we propose a unified framework to understand, identify, and control these distinct reasoning modes. First, we introduce a theoretical model based on the Information Bottleneck (IB) principle, formalizing generalization as the learning of a compressed, task-relevant representation and memorization as a failure to compress. Building on this theory, we develop Dynamic Mode Steering (DMS), a novel inference-time algorithm which comprises two components: (1) a lightweight, causally-grounded linear probe that identifies the model's instantaneous reliance on memorization, and (2) a dynamic activation steering mechanism that nudges the model's computation towards pre-identified generalization circuits. We frame DMS as a form of adaptive, self-contrastive decoding. Experiments on reasoning and faithfulness tasks demonstrate that DMS significantly improves logical consistency and factual accuracy, thereby offering a principled approach to enhancing LLM reliability.
Authors (1)
Xuanming Zhang
Submitted
October 25, 2025
arXiv Category
cs.CL
arXiv PDF

Key Contributions

This paper proposes a unified theoretical framework based on the Information Bottleneck principle to understand generalization and memorization in LLMs. It introduces Dynamic Mode Steering (DMS), a novel inference-time algorithm that identifies memorization reliance and steers computation towards generalization circuits, enhancing LLM reliability.

Business Value

Increases the trustworthiness and safety of LLMs for critical applications like healthcare, finance, and legal analysis by reducing unpredictable memorization.