Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_cv 90% Match Research Paper AI Researchers,Generative AI Developers,Content Creators,Media Professionals 2 weeks ago

VISTA: A Test-Time Self-Improving Video Generation Agent

generative-ai β€Ί diffusion
πŸ“„ Abstract

Abstract: Despite rapid advances in text-to-video synthesis, generated video quality remains critically dependent on precise user prompts. Existing test-time optimization methods, successful in other domains, struggle with the multi-faceted nature of video. In this work, we introduce VISTA (Video Iterative Self-improvemenT Agent), a novel multi-agent system that autonomously improves video generation through refining prompts in an iterative loop. VISTA first decomposes a user idea into a structured temporal plan. After generation, the best video is identified through a robust pairwise tournament. This winning video is then critiqued by a trio of specialized agents focusing on visual, audio, and contextual fidelity. Finally, a reasoning agent synthesizes this feedback to introspectively rewrite and enhance the prompt for the next generation cycle. Experiments on single- and multi-scene video generation scenarios show that while prior methods yield inconsistent gains, VISTA consistently improves video quality and alignment with user intent, achieving up to 60% pairwise win rate against state-of-the-art baselines. Human evaluators concur, preferring VISTA outputs in 66.4% of comparisons.
Authors (6)
Do Xuan Long
Xingchen Wan
Hootan Nakhost
Chen-Yu Lee
Tomas Pfister
Sercan Γ–. ArΔ±k
Submitted
October 17, 2025
arXiv Category
cs.CV
arXiv PDF

Key Contributions

VISTA is a novel multi-agent system that autonomously improves text-to-video generation at test-time through iterative prompt refinement. It decomposes user ideas into temporal plans, uses a tournament to select the best video, critiques it with specialized agents, and employs a reasoning agent to rewrite prompts for subsequent generations, leading to consistent quality improvements.

Business Value

Significantly enhances the quality and control of AI-generated videos, making it a powerful tool for content creators, marketers, and filmmakers by reducing the need for expert prompt engineering and iterative manual adjustments.