Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Logical reasoning with large language models (LLMs) has received growing
attention. One mainstream approach translates natural language into formal
logic and then applies symbolic solvers for deduction. While effective in many
tasks, these LLM-based translators often fail to generate consistent symbolic
representations when the same concept appears in different linguistic forms.
Such inconsistencies break logical coherence and lead to solver errors.
However, most existing benchmarks lack this type of linguistic variation, which
frequently occurs in real-world text, leaving the problem underexplored. To
address this gap, we present SoLT, a benchmark that systematically rewrites
reasoning datasets into diverse yet logically equivalent forms across multiple
levels. Beyond evaluation, SoLT also provides a general method to enrich any
dataset with linguistic diversity while preserving both meaning and logic. To
further enhance the stability of LLM-based reasoning, we propose MenTaL, which
explicitly guides models to build a concept-symbol mapping table during
translation. By linking equivalent expressions to shared symbols, MenTaL
maintains consistency and mitigates symbol drift. Experiments on SoLT
demonstrate that LLMs indeed suffer from inconsistent symbol mapping under
linguistic variation, leading to significant drops in reasoning accuracy.
Meanwhile, applying MenTaL brings clear and stable performance improvements
across diverse inputs. Overall, our findings reveal that overlooking linguistic
diversity hides key weaknesses in LLM-based translators, and our work offers a
step toward more reliable logical reasoning in varied real-world scenarios. Our
code is available at https://github.com/wufeiwuwoshihua/LinguDiver.
Authors (7)
Qingchuan Li
Jiatong Li
Zirui Liu
Mingyue Cheng
Yuting Zeng
Qi Liu
+1 more
Key Contributions
Addresses the instability of LLMs in translating natural language to formal logic due to linguistic variations. Introduces SoLT, a benchmark that systematically enriches datasets with diverse yet logically equivalent forms, and proposes MenTa to enhance LLM reasoning stability, aiming to improve the reliability of LLM-based logical deduction.
Business Value
Enhances the reliability of AI systems performing logical reasoning, crucial for applications in legal tech, formal verification, and complex decision support systems.