Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_ml 95% Match Research Paper LLM Researchers,NLP Engineers,AI System Developers 1 day ago

Scaling Graph Chain-of-Thought Reasoning: A Multi-Agent Framework with Efficient LLM Serving

large-language-models › reasoning
📄 Abstract

Abstract: Graph Chain-of-Thought (Graph-CoT) enables large language models (LLMs) to perform step-by-step reasoning over graph-structured knowledge, but existing pipelines suffer from low accuracy, excessive token usage, high latency, and low throughput due to single-agent monolithic prompts, repeated context re-encoding, and inefficient serving execution. We present GLM, the first multi-agent Graph-CoT system co-designed with an optimized LLM serving architecture. GLM decomposes reasoning into specialized agents for classification, reasoning, action generation, and graph retrieval, enabling branching and selective context sharing to reduce prompt length and reasoning iterations while preserving reasoning quality, thereby improving accuracy and reducing overall token consumption. To scale inference, we introduce a Graph-CoT-aware LLM inference mechanism with graph-specific KV-cache management, priority-based eviction, and pipelined execution to improve serving efficiency. Experiments demonstrate that GLM improves answer accuracy by up to 38%, reduces token cost by up to 95.7%, lowers inference latency by 90.3%, and achieves up to 15.1x higher throughput compared to state-of-the-art Graph-CoT baselines, enabling efficient adoption for complex real-world reasoning at scale.
Authors (14)
Chengying Huan
Ziheng Meng
Yongchao Liu
Zhengyi Yang
Yun Zhu
Yue Yun
+8 more
Submitted
November 3, 2025
arXiv Category
cs.LG
arXiv PDF

Key Contributions

Introduces GLM, the first multi-agent Graph-CoT system integrated with an optimized LLM serving architecture. It decomposes reasoning into specialized agents and employs graph-specific KV-cache management and pipelined execution to improve accuracy, reduce token usage, latency, and increase throughput.

Business Value

Enables more efficient and accurate reasoning over complex knowledge graphs using LLMs, which can power advanced Q&A systems, intelligent assistants, and knowledge discovery platforms.