arxiv_ai 95% Match Research Paper 3D artists,Game developers,Designers,AI researchers,Content creators 1 week ago

4-Doodle: Text to 3D Sketches that Move!

generative-ai › diffusion

📄 Abstract

Abstract: We present a novel task: text-to-3D sketch animation, which aims to bring freeform sketches to life in dynamic 3D space. Unlike prior works focused on photorealistic content generation, we target sparse, stylized, and view-consistent 3D vector sketches, a lightweight and interpretable medium well-suited for visual communication and prototyping. However, this task is very challenging: (i) no paired dataset exists for text and 3D (or 4D) sketches; (ii) sketches require structural abstraction that is difficult to model with conventional 3D representations like NeRFs or point clouds; and (iii) animating such sketches demands temporal coherence and multi-view consistency, which current pipelines do not address. Therefore, we propose 4-Doodle, the first training-free framework for generating dynamic 3D sketches from text. It leverages pretrained image and video diffusion models through a dual-space distillation scheme: one space captures multi-view-consistent geometry using differentiable B\'ezier curves, while the other encodes motion dynamics via temporally-aware priors. Unlike prior work (e.g., DreamFusion), which optimizes from a single view per step, our multi-view optimization ensures structural alignment and avoids view ambiguity, critical for sparse sketches. Furthermore, we introduce a structure-aware motion module that separates shape-preserving trajectories from deformation-aware changes, enabling expressive motion such as flipping, rotation, and articulated movement. Extensive experiments show that our method produces temporally realistic and structurally stable 3D sketch animations, outperforming existing baselines in both fidelity and controllability. We hope this work serves as a step toward more intuitive and accessible 4D content creation.

Authors (6)

Hao Chen

Jiaqi Wang

Yonggang Qi

Ke Li

Kaiyue Pang

Yi-Zhe Song

Submitted

October 29, 2025

arXiv Category

cs.GR

arXiv PDF

Key Contributions

4-Doodle is the first training-free framework for generating dynamic 3D sketches from text. It addresses challenges like the lack of paired datasets and the difficulty of modeling sketch abstraction by leveraging pretrained diffusion models through a dual-space distillation scheme for multi-view consistency and temporal coherence.

Business Value

Democratizes 3D content creation by enabling users to generate animated 3D sketches from simple text prompts, accelerating prototyping and enhancing creative expression.

Paper Metadata

Innovation Type

Algorithmic and Framework Innovation

Deployment Feasibility

High, as it's a training-free framework leveraging existing diffusion models.

Limitations Addressed

Lack of paired datasets for text and 3D/4D sketches.,Difficulty in modeling structural abstraction of sketches with conventional 3D representations.,Animating sketches requires temporal coherence and multi-view consistency, which current pipelines lack.

Technical Tags

Text-to-3D Sketch Animation3D Vector SketchesDynamic 3D SpaceStylized ContentView-ConsistentTraining-Free FrameworkDiffusion ModelsDual-Space DistillationMulti-view ConsistencyTemporal Coherence

Research Topics

Generative AIComputer GraphicsComputer VisionNatural Language Processing3D Reconstruction

Methods & Architectures

Training-free frameworkDual-space distillationLeveraging pretrained diffusion models (image and video) 4-Doodle

Applications & Tasks

Creative Arts Design Prototyping Gaming Education Text-to-3D GenerationSketch AnimationContent Creation Generating animated 3D sketches from text descriptionsCreating dynamic 3D vector art

Related Fields

Generative AIComputer GraphicsComputer VisionNatural Language Processing3D Modeling

Keywords

text-to-3dsketch animation3d sketchesdiffusion modelsgenerative aitraining-freedual-space distillationmulti-view consistencytemporal coherencevector graphicscomputer graphicsnlp

Academic Context

#Generative AI#Computer Graphics#Computer Vision#Natural Language Processing#3D Reconstruction

Commercial Potential

Potential Products

3D sketch generation toolsAnimation software pluginsInteractive design platforms

Target Industries

GamingAnimationProduct DesignEducationAdvertising

Use Case Examples

Generating a moving 3D sketch of a character from a text description.Creating animated 3D concept art for a video game.

Competitive Edge

Offers a novel approach to text-to-3D generation focused on stylized sketches rather than photorealism, with the added capability of animation and a training-free design.

Market Opportunity

Large and growing, driven by demand for 3D content creation tools.

Revenue Models

Software licensingSaaS offerings for content generation.

Resource Requirements

Compute Needs

Moderate, relies on pretrained diffusion models which can be computationally intensive during inference.

Data Requirements

Does not require paired text-3D sketch datasets for training.

Deployment Constraints

Output quality and control might be less precise than manual modeling.,Requires powerful GPUs for efficient generation.

Scalability

Scalability is dependent on the underlying diffusion model implementations.

Production Readiness

Maturity Level

Research

Time to Market

1-2 years

Patent Potential

Moderate, for the dual-space distillation and training-free framework.

View Full Paper Back to Papers