arxiv_ai 95% Match Research Paper LLM developers,AI researchers,NLP engineers 1 week ago

Seek in the Dark: Reasoning via Test-Time Instance-Level Policy Gradient in Latent Space

large-language-models › reasoning

📄 Abstract

Abstract: Reasoning ability, a core component of human intelligence, continues to pose a significant challenge for Large Language Models (LLMs) in the pursuit of AGI. Although model performance has improved under the training scaling law, significant challenges remain, particularly with respect to training algorithms, such as catastrophic forgetting, and the limited availability of novel training data. As an alternative, test-time scaling enhances reasoning performance by increasing test-time computation without parameter updating. Unlike prior methods in this paradigm focused on token space, we propose leveraging latent space for more effective reasoning and better adherence to the test-time scaling law. We introduce LatentSeek, a novel framework that enhances LLM reasoning through Test-Time Instance-level Adaptation (TTIA) within the model's latent space. Specifically, LatentSeek leverages policy gradient to iteratively update latent representations, guided by self-generated reward signals. LatentSeek is evaluated on a range of reasoning benchmarks, including GSM8K, MATH-500, and AIME2024, across multiple LLM architectures. Results show that LatentSeek consistently outperforms strong baselines, such as Chain-of-Thought prompting and fine-tuning-based methods. Furthermore, our analysis demonstrates that LatentSeek is highly efficient, typically converging within a few iterations for problems of average complexity, while also benefiting from additional iterations, thereby highlighting the potential of test-time scaling in the latent space. These findings position LatentSeek as a lightweight, scalable, and effective solution for enhancing the reasoning capabilities of LLMs.

Authors (11)

Hengli Li

Chenxi Li

Tong Wu

Xuekai Zhu

Yuxuan Wang

Zhaoxin Yu

+5 more

Submitted

May 19, 2025

arXiv Category

cs.LG

arXiv PDF

Key Contributions

This paper introduces LatentSeek, a novel framework that enhances LLM reasoning through Test-Time Instance-level Adaptation (TTIA) in the latent space. It uses policy gradient to iteratively update latent representations guided by self-generated rewards, offering an alternative to token-space adaptation.

Business Value

Significantly improves the reasoning capabilities of deployed LLMs without requiring retraining, making them more effective for complex tasks and reducing the need for constant model updates.

Paper Metadata

Innovation Type

Novel Test-Time Adaptation Method in Latent Space

Deployment Feasibility

Feasible as it operates at test time without parameter updates. Requires careful reward design and computational resources for the iterative adaptation process.

Limitations Addressed

Limited reasoning capabilities of LLMs,Catastrophic forgetting during training,Scarcity of novel training data,Inefficiency of token-space adaptation methods

Technical Tags

reasoninglarge language modelstest-time adaptationlatent spacepolicy gradientinstance-level adaptationcatastrophic forgettingtest-time scaling

Research Topics

LLM ReasoningTest-Time AdaptationContinual LearningMachine Learning Optimization

Methods & Architectures

Test-Time Instance-level Adaptation (TTIA)Policy gradientLatent space manipulationSelf-generated rewards Large Language Models (LLMs)

Applications & Tasks

Natural language understanding Complex problem solving AI reasoning tasks Enhancing LLM reasoningImproving test-time performanceMitigating catastrophic forgetting Improving reasoning ability of LLMs at test time

Related Fields

Natural Language ProcessingMachine LearningArtificial IntelligenceReinforcement Learning

Keywords

LLM reasoningtest-time adaptationlatent spacepolicy gradientinstance-level adaptationtest-time scalingcatastrophic forgettingnatural language processingartificial intelligencedeep learning

Academic Context

#LLM Reasoning#Test-Time Adaptation#Continual Learning#Machine Learning Optimization

Commercial Potential

Potential Products

LLM reasoning enhancement modulesAdaptive AI systems

Target Industries

TechnologyCustomer ServiceResearch and DevelopmentEducation

Use Case Examples

Improving chatbot reasoning for complex queriesEnhancing AI assistants' problem-solving skillsDeveloping more capable AI for scientific reasoning

Competitive Edge

Offers a novel approach to enhance LLM reasoning by operating in the latent space at test time, potentially outperforming token-space methods.

Resource Requirements

Compute Needs

Moderate to High (at test time for adaptation)

Data Requirements

No specific training datasets required; operates on existing LLMs.

Deployment Constraints

Increased inference latency due to test-time adaptation

Scalability

Scales with the complexity of the LLM and the number of adaptation steps.

View Full Paper Back to Papers