Press Enter to pause or resume all videos.
TL;DR: We Introduce Zero4D, a novel approach to generate synchronized multi view videos from a single video using off-the-shelf video diffusion model without any training.
(a) Novel-view video: Generating new videos from target camera views.
(Dynamic view videos and fixed view videos are synchronized).
(b) Bullet time video: Generating synchronized time-frozen videos at multiple time indices.
Generation pipeline of Zero4D: (a) Key frame generation step: Starting from the input video(shown as the gray-shaded row), we sequentially generate boundary frames—novel view synthesis, end-view video generation, and end-frame view synthesis—where each step leverages the results of the previous one. (b) Spatio-temporal bidirectional interpolation step: Starting from the noisy frames, we alternately perform camera-axis and time-axis interpolation, each conditioned on boundary frames, to progressively denoise the 4D grid. Through this bidirectional process, noisy latents are refined into globally coherent spatio-temporal videos.