Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Most recent real-world image super-resolution (Real-ISR) methods employ
pre-trained text-to-image (T2I) diffusion models to synthesize the high-quality
image either from random Gaussian noise, which yields realistic results but is
slow due to iterative denoising, or directly from the input low-quality image,
which is efficient but at the price of lower output quality. These approaches
train ControlNet or LoRA modules while keeping the pre-trained model fixed,
which often introduces over-enhanced artifacts and hallucinations, suffering
from the robustness to inputs of varying degradations. Recent visual
autoregressive (AR) models, such as pre-trained Infinity, can provide strong
T2I generation capabilities while offering superior efficiency by using the
bitwise next-scale prediction strategy. Building upon next-scale prediction, we
introduce a robust Real-ISR framework, namely Next-Scale Autoregressive
Modeling (NSARM). Specifically, we train NSARM in two stages: a transformation
network is first trained to map the input low-quality image to preliminary
scales, followed by an end-to-end full-model fine-tuning. Such a comprehensive
fine-tuning enhances the robustness of NSARM in Real-ISR tasks without
compromising its generative capability. Extensive quantitative and qualitative
evaluations demonstrate that as a pure AR model, NSARM achieves superior visual
results over existing Real-ISR methods while maintaining a fast inference
speed. Most importantly, it demonstrates much higher robustness to the quality
of input images, showing stronger generalization performance. Project page:
https://github.com/Xiangtaokong/NSARM
Key Contributions
Proposes NSARM, a robust real-world image super-resolution framework leveraging next-scale prediction. It addresses the trade-off between efficiency and quality in diffusion-based super-resolution by adapting pre-trained T2I models more effectively, reducing artifacts and improving robustness to varying input degradations.
Business Value
Enables higher quality and more reliable image upscaling for various applications, from enhancing old photos to improving the clarity of surveillance footage, leading to better visual data analysis and user experience.