Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: In the visual generative area, discrete diffusion models are gaining traction
for their efficiency and compatibility. However, pioneered attempts still fall
behind their continuous counterparts, which we attribute to noise (absorbing
state) design and sampling heuristics. In this study, we propose a rehashing
noise approach for discrete diffusion transformer (termed ReDDiT), with the aim
to extend absorbing states and improve expressive capacity of discrete
diffusion models. ReDDiT enriches the potential paths that latent variables
traverse during training with randomized multi-index corruption. The derived
rehash sampler, which reverses the randomized absorbing paths, guarantees high
diversity and low discrepancy of the generation process. These reformulations
lead to more consistent and competitive generation quality, mitigating the need
for heavily tuned randomness. Experiments show that ReDDiT significantly
outperforms the baseline model (reducing gFID from 6.18 to 1.61) and is on par
with the continuous counterparts.