reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

FreeVS: Generative View Synthesis on Free Driving Trajectory

Authors: Qitai Wang, Lue Fan, Yuqi Wang, Yuntao Chen, Zhaoxiang Zhang

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on the Waymo Open Dataset show that Free VS has a strong image synthesis performance on both the recorded trajectories and novel trajectories.
Researcher Affiliation	Academia	1School of Future Technology, University of Chinese Academy of Sciences (UCAS), 2NLPR, MAIS, Institute of Automation, Chinese Academy of Sciences (CASIA), 3CUHK 4Center for Artificial Intelligence and Robotics, HKISI, CAS EMAIL, EMAIL
Pseudocode	No	The paper describes the method and training process verbally and through equations, but does not include a dedicated pseudocode or algorithm block.
Open Source Code	Yes	Project Page & Code: https://freevs24.github.io/
Open Datasets	Yes	Experiments on the Waymo Open Dataset show that Free VS has a strong image synthesis performance on both the recorded trajectories and novel trajectories.
Dataset Splits	Yes	For the front-view or multi-view novel frame synthesis benchmark (Fig. 3(a) and (b)), we sample every fourth frame in driving sequences as test frames. All the remaining frames are used for training NVS counterparts, or as input frames for Free VS. Under the novel camera synthesis benchmark, we reserve all the front-side camera views as test views and use the front and side camera views as train views throughout each sequence.
Hardware Specification	Yes	All experiments are conducted on NVIDIA L20 GPUs. The training costs of generalizable reconstruction methods are measured on 2 RTX 3090 GPUs, while the training cost of Free VS is measured on 8 NVIDIA L20 GPUs. Samely, the inference efficiency of previous methods / Free VS is measured on 3090 / L20 GPU.
Software Dependencies	No	The paper mentions using Stable Video Diffusion, Stable Diffusion checkpoints, and specific optimizer (AdamW) and model backbones (ConVNext-T, CLIP-vision model), but does not provide version numbers for programming languages, libraries, or frameworks like PyTorch or TensorFlow, which are essential for reproducibility.
Experiment Setup	Yes	We train the model for 40,000 iterations with a batch size of 8 and video frame length n = 8. We use the Adam W optimizer (Kingma & Ba, 2014) with a learning rate 1 10 4. During training time, we randomly drop the pseudo-image condition latent as well as the CLIP text description latent with a probability of 20%. We enable the viewpoint transformation simulation with a probability of 50%. During inference, we set the number of sampling steps as 25 and stochasticity η=1.0. When synthesizing images on the existing trajectory, we set the classifierfree guidance (CFG)(Ho & Salimans, 2022) guidance scale to 1.0. For synthesizing images on novel cameras and new trajectories, we enlarge the CFG guidance scale to 2.0 to strengthen the control of 3D prior conditions over the generated results.