Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Graph Chain-of-Thought (Graph-CoT) enables large language models (LLMs) to
perform step-by-step reasoning over graph-structured knowledge, but existing
pipelines suffer from low accuracy, excessive token usage, high latency, and
low throughput due to single-agent monolithic prompts, repeated context
re-encoding, and inefficient serving execution. We present GLM, the first
multi-agent Graph-CoT system co-designed with an optimized LLM serving
architecture. GLM decomposes reasoning into specialized agents for
classification, reasoning, action generation, and graph retrieval, enabling
branching and selective context sharing to reduce prompt length and reasoning
iterations while preserving reasoning quality, thereby improving accuracy and
reducing overall token consumption. To scale inference, we introduce a
Graph-CoT-aware LLM inference mechanism with graph-specific KV-cache
management, priority-based eviction, and pipelined execution to improve serving
efficiency. Experiments demonstrate that GLM improves answer accuracy by up to
38%, reduces token cost by up to 95.7%, lowers inference latency by 90.3%, and
achieves up to 15.1x higher throughput compared to state-of-the-art Graph-CoT
baselines, enabling efficient adoption for complex real-world reasoning at
scale.
Authors (14)
Chengying Huan
Ziheng Meng
Yongchao Liu
Zhengyi Yang
Yun Zhu
Yue Yun
+8 more
Submitted
November 3, 2025
Key Contributions
Introduces GLM, the first multi-agent Graph-CoT system integrated with an optimized LLM serving architecture. It decomposes reasoning into specialized agents and employs graph-specific KV-cache management and pipelined execution to improve accuracy, reduce token usage, latency, and increase throughput.
Business Value
Enables more efficient and accurate reasoning over complex knowledge graphs using LLMs, which can power advanced Q&A systems, intelligent assistants, and knowledge discovery platforms.