Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Despite significant advancements in traditional syntactic communications
based on Shannon's theory, these methods struggle to meet the requirements of
6G immersive communications, especially under challenging transmission
conditions. With the development of generative artificial intelligence (GenAI),
progress has been made in reconstructing videos using high-level semantic
information. In this paper, we propose a scalable generative video semantic
communication framework that extracts and transmits semantic information to
achieve high-quality video reconstruction. Specifically, at the transmitter,
description and other condition signals (e.g., first frame, sketches, etc.) are
extracted from the source video, functioning as text and structural semantics,
respectively. At the receiver, the diffusion-based GenAI large models are
utilized to fuse the semantics of the multiple modalities for reconstructing
the video. Simulation results demonstrate that, at an ultra-low channel
bandwidth ratio (CBR), our scheme effectively captures semantic information to
reconstruct videos aligned with human perception under different
signal-to-noise ratios. Notably, the proposed ``First Frame+Desc." scheme
consistently achieves CLIP score exceeding 0.92 at CBR = 0.0057 for SNR > 0 dB.
This demonstrates its robust performance even under low SNR conditions.