Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Existing image editing methods can handle simple editing instructions very
well. To deal with complex editing instructions, they often need to jointly
fine-tune the large language models (LLMs) and diffusion models (DMs), which
involves very high computational complexity and training cost. To address this
issue, we propose a new method, called \textbf{C}omplex \textbf{I}mage
\textbf{E}diting via \textbf{L}LM \textbf{R}easoning (CIELR), which converts a
complex user instruction into a set of simple and explicit editing actions,
eliminating the need for jointly fine-tuning the large language models and
diffusion models. Specifically, we first construct a structured semantic
representation of the input image using foundation models. Then, we introduce
an iterative update mechanism that can progressively refine this
representation, obtaining a fine-grained visual representation of the image
scene. This allows us to perform complex and flexible image editing tasks.
Extensive experiments on the SmartEdit Reasoning Scenario Set show that our
method surpasses the previous state-of-the-art by 9.955 dB in PSNR, indicating
its superior preservation of regions that should remain consistent. Due to the
limited number of samples of public datasets of complex image editing with
reasoning, we construct a benchmark named CIEBench, containing 86 image
samples, together with a metric specifically for reasoning-based image editing.
CIELR also outperforms previous methods on this benchmark. The code and dataset
are available at
\href{https://github.com/Jia-shao/Reasoning-Editing}{https://github.com/Jia-shao/Reasoning-Editing}.
Authors (4)
Yijia Wang
Yiqing Shen
Weiming Chen
Zhihai He
Submitted
October 31, 2025
Key Contributions
This paper introduces CIELR, a novel method for complex image editing that leverages Large Language Models (LLMs) for reasoning without requiring joint fine-tuning with diffusion models. CIELR converts complex instructions into simple actions, significantly reducing computational cost and enabling more flexible and intuitive image manipulation.
Business Value
Enables more accessible and powerful image editing tools for a wider range of users, accelerating creative workflows and reducing the barrier to entry for professional-level image manipulation.