arxiv_ai 95% Match Research Paper AI researchers,Mathematicians,Computer scientists,Educators,Students in advanced math/CS 2 weeks ago

Enumerate-Conjecture-Prove: Formally Solving Answer-Construction Problems in Math Competitions

large-language-models › reasoning

📄 Abstract

Abstract: Mathematical reasoning is central to artificial intelligence, with applications in education, code generation, and research-level mathematical discovery. Mathematical competitions highlight two problem types: theorem proving, requiring rigorous proofs, and answer construction, requiring creative generation and formal verification of mathematical objects. Existing research reveals that LLMs can tackle difficult answer-construction tasks but are prone to errors from hallucinations and unverifiable steps, while symbolic methods guarantee rigor but falter in creative answer construction. This raises a key understudied question: how to solve answer-construction problems while preserving both LLM creativity and mathematical rigor? To address this problem, we introduce the Enumerate-Conjecture-Prove (ECP) framework, a modular neuro-symbolic method integrating LLM-based enumeration and pattern-driven conjecturing with formal theorem proving in Lean, and ConstructiveBench, a dataset of 3,640 formal answer-construction problems from math competitions. ECP is model agnostic and shows consistent improvements over pure LLM baselines: on the subset of PutnamBench for answer construction, ECP formally solves 6 out of 337 answer-construction problems end to end (up from 4 without ECP) using GPT-5 mini and DeepSeek-Prover-V2-7B. On ConstructiveBench, ECP achieves 33.1% end-to-end state-of-the-art accuracy (up from 32.5%), demonstrating its potential to advance formal mathematical reasoning by combining LLM conjecturing with formal verification. Our code and dataset are publicly available at GitHub (https://github.com/sunjia72/ECP) and Hugging Face (https://huggingface.co/datasets/sunjia72/ConstructiveBench).

Authors (5)

Jialiang Sun

Yuzhi Tang

Ao Li

Chris J. Maddison

Kuldeep S. Meel

Submitted

May 24, 2025

arXiv Category

cs.AI

arXiv PDF

Key Contributions

Introduces the Enumerate-Conjecture-Prove (ECP) framework, a novel neuro-symbolic method that combines LLM creativity for enumeration and conjecturing with the rigor of formal theorem proving in Lean. This framework addresses the challenge of solving answer-construction problems in mathematics while preserving both LLM capabilities and formal guarantees, supported by the new ConstructiveBench dataset.

Business Value

Enhances AI's capability in formal reasoning and creative problem-solving, potentially accelerating mathematical discovery, improving educational tools, and advancing AI for scientific research.

Paper Metadata

Innovation Type

Hybrid Approach/Framework

Deployment Feasibility

Requires integration of LLMs with formal proof assistants like Lean, which can be complex but is feasible in research settings.

Limitations Addressed

LLMs' tendency for hallucinations and unverifiable steps in answer construction; symbolic methods' limitations in creative answer generation.

Performance Gains

Not specified in abstract, but aims to overcome limitations of existing methods.

Technical Tags

neuro-symbolic methodsmathematical reasoninganswer constructiontheorem provingLLMsformal verificationLeanconstructive mathematicsEnumerate-Conjecture-Prove (ECP)ConstructiveBench

Research Topics

Mathematical AIFormal MethodsNeuro-Symbolic AIAI for EducationAutomated Theorem Proving

Methods & Architectures

Neuro-symbolic integrationLLM-based enumerationPattern-driven conjecturingFormal theorem proving (Lean)ConstructiveBench dataset Enumerate-Conjecture-Prove (ECP) frameworkLLMsSymbolic solvers

Applications & Tasks

Mathematics AI Research Education Code Generation Mathematical Answer ConstructionFormal Verification of Mathematical ObjectsCreative Mathematical Problem Solving Solving answer-construction problems in math competitionsGenerating and verifying mathematical objectsBridging LLM creativity with mathematical rigor

Datasets & Benchmarks

Datasets

ConstructiveBench

Related Fields

Artificial IntelligenceFormal MethodsLogicComputer ScienceMathematics Education

Keywords

mathematical reasoningLLMsneuro-symbolic AIformal verificationtheorem provinganswer constructionLeanmath competitionscreativityrigorEnumerate-Conjecture-ProveConstructiveBenchAI for math

Academic Context

#Mathematical AI#Formal Methods#Neuro-Symbolic AI#AI for Education#Automated Theorem Proving

Technology Stack

Frameworks & Libraries

Lean

Commercial Potential

Potential Products

AI-powered mathematical problem solversAutomated theorem provers for complex proofsEducational tools for learning advanced mathematics

Target Industries

EducationResearch & DevelopmentSoftware Development (formal verification)

Use Case Examples

Assisting students in solving challenging problems in math olympiads.Automating parts of formal verification for software or hardware.

Competitive Edge

Offers a unique hybrid approach that aims to combine the strengths of LLMs and symbolic methods, addressing a gap not fully covered by either alone.

Market Opportunity

Growing market for AI in education and scientific discovery.

Revenue Models

SaaS for educational platformslicensing for research tools.

Resource Requirements

Compute Needs

High (for LLM inference and symbolic computation)

Data Requirements

Large, diverse dataset of mathematical problems and proofs (ConstructiveBench).

Deployment Constraints

Integration complexity, computational cost, need for specialized expertise (Lean).

Scalability

Scalability depends on the efficiency of both LLM components and the formal prover.

Production Readiness

Maturity Level

Research/Experimental

Time to Market

Long

View Full Paper Back to Papers