Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: We present a novel task: text-to-3D sketch animation, which aims to bring
freeform sketches to life in dynamic 3D space. Unlike prior works focused on
photorealistic content generation, we target sparse, stylized, and
view-consistent 3D vector sketches, a lightweight and interpretable medium
well-suited for visual communication and prototyping. However, this task is
very challenging: (i) no paired dataset exists for text and 3D (or 4D)
sketches; (ii) sketches require structural abstraction that is difficult to
model with conventional 3D representations like NeRFs or point clouds; and
(iii) animating such sketches demands temporal coherence and multi-view
consistency, which current pipelines do not address. Therefore, we propose
4-Doodle, the first training-free framework for generating dynamic 3D sketches
from text. It leverages pretrained image and video diffusion models through a
dual-space distillation scheme: one space captures multi-view-consistent
geometry using differentiable B\'ezier curves, while the other encodes motion
dynamics via temporally-aware priors. Unlike prior work (e.g., DreamFusion),
which optimizes from a single view per step, our multi-view optimization
ensures structural alignment and avoids view ambiguity, critical for sparse
sketches. Furthermore, we introduce a structure-aware motion module that
separates shape-preserving trajectories from deformation-aware changes,
enabling expressive motion such as flipping, rotation, and articulated
movement. Extensive experiments show that our method produces temporally
realistic and structurally stable 3D sketch animations, outperforming existing
baselines in both fidelity and controllability. We hope this work serves as a
step toward more intuitive and accessible 4D content creation.
Authors (6)
Hao Chen
Jiaqi Wang
Yonggang Qi
Ke Li
Kaiyue Pang
Yi-Zhe Song
Submitted
October 29, 2025
Key Contributions
4-Doodle is the first training-free framework for generating dynamic 3D sketches from text. It addresses challenges like the lack of paired datasets and the difficulty of modeling sketch abstraction by leveraging pretrained diffusion models through a dual-space distillation scheme for multi-view consistency and temporal coherence.
Business Value
Democratizes 3D content creation by enabling users to generate animated 3D sketches from simple text prompts, accelerating prototyping and enhancing creative expression.