arxiv_ml 90% Match Research Paper LLM researchers,AI safety researchers,NLP engineers,ML theorists 2 weeks ago

Soundness-Aware Level: A Microscopic Signature that Predicts LLM Reasoning Potential

large-language-models › reasoning

📄 Abstract

Abstract: Reinforcement learning with verifiable rewards (RLVR) can elicit strong reasoning in large language models (LLMs), while their performance after RLVR varies dramatically across different base models. This raises a fundamental question: what microscopic property of pre-trained models leads to this variation? To investigate, we formalize reasoning as chains of Horn clauses ("if-then" rules) built from features extracted from the LLM's latent space via cross-layer sparse autoencoders (SAEs). We estimate the transition probabilities between its features, and further categorize each rule by its semantic soundness level (e.g., strict, plausible, noisy) with an LLM. Our key discovery is that high-potential models are inherently soundness-aware: their internal probability distributions systematically shift across rules' soundness levels, becoming highly distinct for "strict" versus "noisy" rules. In contrast, weaker models are soundness-agnostic, collapsing to one distribution regardless of soundness levels. To quantify this, we introduce the Soundness-Aware Level (SAL), a microscopic metric using the Jensen-Shannon Divergence to measure the separation between these distributions. We show that SAL's predictions of post-RLVR reasoning performance follow a precise empirical law (R^2=0.87) across diverse model families (Qwen, Mistral, Llama, DeepSeek) and scales (0.5B-14B). This reveals that a model's reasoning potential is tied to its intrinsic, pre-trained ability to distinguish sound knowledge from unsound ones. These findings underscore the critical role of model pre-training in shaping reasoning and offer a practical metric grounded in the model's internal mechanisms for selecting/designing stronger base models.

Authors (4)

Xuansheng Wu

Xiaoman Pan

Wenlin Yao

Jianshu Chen

Submitted

October 17, 2025

arXiv Category

cs.LG

arXiv PDF

Key Contributions

This paper identifies 'soundness-awareness' as a microscopic signature that predicts an LLM's reasoning potential, particularly after RLVR training. It formalizes reasoning as Horn clauses derived from latent space features (via SAEs) and shows that high-potential models exhibit distinct probability distributions across different soundness levels, unlike weaker models.

Business Value

Enables better selection and fine-tuning of LLMs for tasks requiring complex reasoning, leading to more reliable and capable AI systems.

Paper Metadata

Innovation Type

Theoretical/Methodological

Deployment Feasibility

Requires expertise in LLM internals, SAEs, and formal logic. Insights can guide model selection and fine-tuning strategies.

Limitations Addressed

The variation in performance of LLMs after RLVR training and the lack of understanding of the underlying microscopic properties that determine reasoning capabilities.

Performance Gains

Predicts LLM reasoning potential,Explains performance variation after RLVR

Technical Tags

reasoning potentiallarge language models (LLMs)reinforcement learning with verifiable rewards (RLVR)soundness-awareHorn clausessparse autoencoders (SAEs)latent space featurestransition probabilitiessemantic soundness levelmicroscopic signature

Research Topics

LLM ReasoningAI AlignmentInterpretabilityRepresentation LearningReinforcement Learning

Methods & Architectures

Formalization of reasoning as Horn clausesCross-layer Sparse Autoencoders (SAEs)Analysis of transition probabilitiesCategorization by soundness levelRLVR Large Language Models (LLMs)Sparse Autoencoders (SAEs)

Applications & Tasks

Natural Language Processing Artificial Intelligence Research AI Safety Predicting LLM reasoning capabilitiesUnderstanding variation in RLVR performanceIdentifying microscopic properties of LLMs Formalizing reasoning using Horn clausesExtracting features from LLM latent spaceQuantifying soundness awareness in LLMs

Related Fields

Natural Language ProcessingMachine Learning TheoryAI SafetyInterpretabilityReinforcement Learning

Keywords

LLM ReasoningSoundness AwarenessRLVRHorn ClausesSparse AutoencodersLatent SpaceMicroscopic SignatureAI AlignmentInterpretabilityLLM Evaluation

Academic Context

#LLM Reasoning#AI Alignment#Interpretability#Representation Learning#Reinforcement Learning

Commercial Potential

Potential Products

LLM evaluation toolsReasoning assessment frameworks

Target Industries

AI DevelopmentTechnologyResearch

Use Case Examples

Selecting LLMs for complex problem-solving tasksDiagnosing reasoning failures in LLMsGuiding fine-tuning for improved reasoning

Competitive Edge

Offers a novel, microscopic perspective on LLM reasoning capabilities, providing a predictive measure beyond standard performance metrics.

Market Opportunity

Fundamental research impacting the rapidly growing LLM market.

Revenue Models

N/A (fundamental research).

Resource Requirements

Compute Needs

Moderate to high, for training SAEs and analyzing LLM internal states.

Data Requirements

Access to pre-trained LLMs and potentially data for training SAEs.

Deployment Constraints

Requires deep technical understanding of LLM architectures and internal representations.

Scalability

The analysis framework is applicable to various LLM sizes.

Regulatory Considerations

Ethical implications of understanding and potentially manipulating LLM reasoning.

Production Readiness

Maturity Level

Research

Time to Market

2-4 years for practical tools based on these insights.

View Full Paper Back to Papers