Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_cl 90% Match Research Paper AI researchers,Content creators,Journalists,Information scientists 1 week ago

WikiVideo: Article Generation from Multiple Videos

computer-vision › video-understanding
📄 Abstract

Abstract: We introduce the task of grounded article generation with the goal of creating a Wikipedia-style article from multiple diverse videos about real-world events -- from natural disasters to political elections -- where all the information in the article is supported by video evidence. Videos are intuitive sources for retrieval-augmented generation (RAG), but most contemporary RAG workflows focus heavily on text while existing methods for video-based summarization focus on low-level scene understanding rather than high-level event semantics. To close this gap, we introduce WikiVideo, a benchmark consisting of expert-written articles and densely annotated videos that provide evidence for articles' claims, facilitating the integration of video into RAG pipelines and enabling the creation of in-depth content that is grounded in multimodal sources. We further propose Collaborative Article Generation (CAG), a novel interactive method for article creation from multiple videos. CAG leverages an iterative interaction between an r1-style reasoning model and a VideoLLM to draw higher-level inferences about the target event than is possible with VideoLLMs alone, which fixate on low-level visual features. We benchmark state-of-the-art VideoLLMs and CAG in both oracle retrieval and RAG settings and find that CAG consistently outperforms alternative methods, while suggesting intriguing avenues for future work.
Authors (8)
Alexander Martin
Reno Kriz
William Gantt Walden
Kate Sanders
Hannah Recknor
Eugene Yang
+2 more
Submitted
April 1, 2025
arXiv Category
cs.CV
arXiv PDF

Key Contributions

Introduces WikiVideo, a benchmark for grounded article generation from multiple videos, and proposes Collaborative Article Generation (CAG), an interactive method. This work addresses the gap in RAG by focusing on high-level event semantics in videos, enabling the creation of articles fully supported by video evidence.

Business Value

Automates the creation of informative articles from video content, valuable for news organizations, educational platforms, and content creators.