Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
MERGE unifies image generation and depth estimation using pre-trained text-to-image diffusion models without catastrophic degradation. It introduces a plug-and-play framework for mode switching and a Group Reuse Mechanism for efficient parameter utilization, demonstrating that diffusion models can extend beyond generation to tasks like depth estimation.
Enables more versatile and efficient use of powerful pre-trained generative models for tasks beyond simple image creation, potentially reducing development costs for applications requiring both image synthesis and spatial understanding.