Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_cv 98% Match Research Paper Researchers in Generative AI,Computer Vision Engineers,Digital Artists,Game Developers 1 month ago

Filter-Guided Diffusion for Controllable Image Generation

computer-vision › diffusion-models
📄 Abstract

Abstract: Recent advances in diffusion-based generative models have shown incredible promise for zero shot image-to-image translation and editing. Most of these approaches work by combining or replacing network-specific features used in the generation of new images with those taken from the inversion of some guide image. Methods of this type are considered the current state-of-the-art in training-free approaches, but have some notable limitations: they tend to be costly in runtime and memory, and often depend on deterministic sampling that limits variation in generated results. We propose Filter-Guided Diffusion (FGD), an alternative approach that leverages fast filtering operations during the diffusion process to support finer control over the strength and frequencies of guidance and can work with non-deterministic samplers to produce greater variety. With its efficiency, FGD can be sampled over multiple seeds and hyperparameters in less time than a single run of other SOTA methods to produce superior results based on structural and semantic metrics. We conduct extensive quantitative and qualitative experiments to evaluate the performance of FGD in translation tasks and also demonstrate its potential in localized editing when used with masks. Project page: https://filterguideddiffusion.github.io/

Key Contributions

Introduces Filter-Guided Diffusion (FGD), a novel approach for controllable image generation using diffusion models. FGD leverages fast filtering operations during the diffusion process to enable finer control over guidance strength and frequencies, and supports non-deterministic sampling for greater variety. This method addresses the runtime and memory costs, as well as the limited variation issues of existing training-free diffusion-based editing techniques.

Business Value

Enables faster and more flexible creation of diverse visual content for applications like graphic design, advertising, and game development. Offers more creative control to users in image generation and editing tasks.