arxiv_ml 96% Match Research Paper AI Researchers,LLM Developers,ML Engineers 19 hours ago

The Sequential Edge: Inverse-Entropy Voting Beats Parallel Self-Consistency at Matched Compute

large-language-models › reasoning

📄 Abstract

Abstract: We revisit test-time scaling for language model reasoning and ask a fundamental question: at equal token budget and compute, is it better to run multiple independent chains in parallel, or to run fewer chains that iteratively refine through sequential steps? Through comprehensive evaluation across 5 state-of-the-art open source models and 3 challenging reasoning benchmarks, we find that sequential scaling where chains explicitly build upon previous attempts consistently outperforms the dominant parallel self-consistency paradigm in 95.6% of configurations with gains in accuracy upto 46.7%. Further, we introduce inverse-entropy weighted voting, a novel training-free method to further boost the accuracy of sequential scaling. By weighing answers in proportion to the inverse entropy of their reasoning chains, we increase our success rate over parallel majority and establish it as the optimal test-time scaling strategy. Our findings fundamentally challenge the parallel reasoning orthodoxy that has dominated test-time scaling since Wang et al.'s self-consistency decoding (Wang et al., 2022), positioning sequential refinement as the robust default for modern LLM reasoning and necessitating a paradigm shift in how we approach inference-time optimization.

Key Contributions

This paper demonstrates that sequential scaling, where chains iteratively refine previous attempts, consistently outperforms parallel self-consistency at matched compute budgets for LLM reasoning. It introduces inverse-entropy weighted voting as a novel, training-free method to further boost sequential scaling accuracy.

Business Value

Enables more efficient and accurate use of LLMs for complex reasoning tasks by optimizing how computational resources are utilized during inference, leading to better performance without increased cost.

Paper Metadata

Innovation Type

Algorithmic Improvement

Deployment Feasibility

High, as it's a strategy applied during inference and doesn't require model retraining.

Limitations Addressed

Addresses the question of optimal test-time scaling strategies for LLM reasoning, challenging the dominant parallel approach and offering a more effective sequential alternative.

Performance Gains

Sequential scaling outperforms parallel self-consistency in 95.6% of configurations, with accuracy gains up to 46.7%. Inverse-entropy voting further boosts performance.

Technical Tags

test-time scalinglanguage model reasoningsequential scalingparallel self-consistencyinverse-entropy votingtoken budgetcompute budgetLLM evaluationreasoning benchmarks

Research Topics

LLM Reasoning StrategiesTest-Time AdaptationModel ScalingLLM EvaluationEnsemble Methods

Methods & Architectures

Sequential scalingInverse-entropy weighted votingcomparison with parallel self-consistencyevaluation across 5 LLMs and 3 benchmarks

Applications & Tasks

Natural Language Processing Artificial Intelligence Research Improving LLM reasoning accuracyOptimizing test-time computationComparing sequential vs. parallel scaling Enhancing LLM reasoning performance at test timeDeveloping optimal scaling strategiesImproving accuracy with limited compute

Datasets & Benchmarks

Benchmarks

3 challenging reasoning benchmarks

Accuracy

Related Fields

Natural Language ProcessingMachine LearningArtificial IntelligenceLLM Evaluation

Keywords

test-time scalingLLM reasoningsequential scalingparallel self-consistencyinverse-entropy votingcompute budgettoken budgetlanguage modelsevaluationreasoningaccuracyinference optimization

Academic Context

#LLM Reasoning Strategies#Test-Time Adaptation#Model Scaling#LLM Evaluation#Ensemble Methods

Commercial Potential

Potential Products

LLM inference optimization librariesTools for enhancing LLM reasoning capabilities

Target Industries

TechnologySoftware DevelopmentAI Services

Use Case Examples

Improving the accuracy of LLM-based problem solversOptimizing inference costs for large-scale LLM deployments

Competitive Edge

Challenges the established parallel self-consistency paradigm by demonstrating the superiority of sequential scaling and introducing a novel voting mechanism.

Market Opportunity

Large and growing market for LLM inference optimization and performance enhancement.

Revenue Models

Integration into LLM serving platformslicensing of optimization techniques.

Resource Requirements

Compute Needs

Applies during inference, so compute requirements depend on the number of sequential steps and model size.

Data Requirements

Requires access to LLMs and reasoning benchmarks for evaluation.

Scalability

The sequential approach might increase latency compared to parallel methods but offers better accuracy for a given compute budget.

Production Readiness

Maturity Level

Research

Time to Market

Short (can be implemented as an inference-time strategy)

View Full Paper Back to Papers