Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_ai 94% Match Research Paper AI Researchers,Machine Learning Engineers,Software Developers,Researchers in Reasoning 1 week ago

Once Upon an Input: Reasoning via Per-Instance Program Synthesis

large-language-models › reasoning
📄 Abstract

Abstract: Large language models (LLMs) excel at zero-shot inference but continue to struggle with complex, multi-step reasoning. Recent methods that augment LLMs with intermediate reasoning steps such as Chain of Thought (CoT) and Program of Thought (PoT) improve performance but often produce undesirable solutions, especially in algorithmic domains. We introduce Per-Instance Program Synthesis (PIPS), a method that generates and refines programs at the instance-level using structural feedback without relying on task-specific guidance or explicit test cases. To further improve performance, PIPS incorporates a confidence metric that dynamically chooses between direct inference and program synthesis on a per-instance basis. Experiments across three frontier LLMs and 30 benchmarks including all tasks of Big Bench Extra Hard (BBEH), visual question answering tasks, relational reasoning tasks, and mathematical reasoning tasks show that PIPS improves the absolute harmonic mean accuracy by up to 8.6% and 9.4% compared to PoT and CoT respectively, and reduces undesirable program generations by 65.1% on the algorithmic tasks compared to PoT with Gemini-2.0-Flash.
Authors (4)
Adam Stein
Neelay Velingker
Mayur Naik
Eric Wong
Submitted
October 26, 2025
arXiv Category
cs.CL
arXiv PDF

Key Contributions

Introduces Per-Instance Program Synthesis (PIPS), a method that generates and refines programs at the instance-level using structural feedback without task-specific guidance. It incorporates a confidence metric to dynamically choose between direct inference and program synthesis, significantly improving accuracy on complex reasoning tasks.

Business Value

Enhances the capability of AI systems to solve complex problems, leading to more powerful AI assistants, automated code generation, and improved performance in scientific and engineering domains.