Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_cv 90% Match Research Paper Computer vision researchers,Image processing engineers,AI researchers working on generative models 1 month ago

NSARM: Next-Scale Autoregressive Modeling for Robust Real-World Image Super-Resolution

generative-ai › diffusion
📄 Abstract

Abstract: Most recent real-world image super-resolution (Real-ISR) methods employ pre-trained text-to-image (T2I) diffusion models to synthesize the high-quality image either from random Gaussian noise, which yields realistic results but is slow due to iterative denoising, or directly from the input low-quality image, which is efficient but at the price of lower output quality. These approaches train ControlNet or LoRA modules while keeping the pre-trained model fixed, which often introduces over-enhanced artifacts and hallucinations, suffering from the robustness to inputs of varying degradations. Recent visual autoregressive (AR) models, such as pre-trained Infinity, can provide strong T2I generation capabilities while offering superior efficiency by using the bitwise next-scale prediction strategy. Building upon next-scale prediction, we introduce a robust Real-ISR framework, namely Next-Scale Autoregressive Modeling (NSARM). Specifically, we train NSARM in two stages: a transformation network is first trained to map the input low-quality image to preliminary scales, followed by an end-to-end full-model fine-tuning. Such a comprehensive fine-tuning enhances the robustness of NSARM in Real-ISR tasks without compromising its generative capability. Extensive quantitative and qualitative evaluations demonstrate that as a pure AR model, NSARM achieves superior visual results over existing Real-ISR methods while maintaining a fast inference speed. Most importantly, it demonstrates much higher robustness to the quality of input images, showing stronger generalization performance. Project page: https://github.com/Xiangtaokong/NSARM

Key Contributions

Proposes NSARM, a robust real-world image super-resolution framework leveraging next-scale prediction. It addresses the trade-off between efficiency and quality in diffusion-based super-resolution by adapting pre-trained T2I models more effectively, reducing artifacts and improving robustness to varying input degradations.

Business Value

Enables higher quality and more reliable image upscaling for various applications, from enhancing old photos to improving the clarity of surveillance footage, leading to better visual data analysis and user experience.