reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Compressing Streamable Free-Viewpoint Videos to 0.1 MB per Frame

Authors: Luyang Tang, Jiayu Yang, Rui Peng, Yongqi Zhai, Shihe Shen, Ronggang Wang

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on widely used datasets demonstrate the state-of-the-art performance of our framework in both synthesis quality and efficiency, i.e., achieving per-frame training in 13 seconds with a storage cost of 0.1 MB and real-time rendering at 120 FPS.
Researcher Affiliation	Collaboration	1Guangdong Provincial Key Laboratory of Ultra High Definition Immersive Media Technology, Shenzhen Graduate School, Peking University, China 2Pengcheng Laboratory, China
Pseudocode	No	The paper describes the methodology using prose and mathematical equations but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Code https://github.com/Pomelomm/i FVC
Open Datasets	Yes	We conduct experiments on three real-world dynamic scene datasets as follows: Neural 3D Video (N3DV) (Li et al. 2022b) ... Meet Room (Li et al. 2022a) ... VRU Basketball Game
Dataset Splits	Yes	Following prior works (Sun et al. 2024), we downsample the original videos by a factor of two for training and testing. [...] we utilize 12 views for training and reserve 1 for testing. [...] We utilize 30 views for training and reserve 4 for testing.
Hardware Specification	No	The paper does not explicitly mention specific hardware details such as GPU models, CPU types, or memory used for running experiments.
Software Dependencies	No	The paper does not provide specific ancillary software details with version numbers, such as programming language versions or library versions, needed to replicate the experiment.
Experiment Setup	Yes	The whole framework starts with sparse points from Sf M at timestep 0. To obtain a high-quality and compact initial representation, we train for 15K steps on the N3DV and Meet Room datasets, and for 30K steps on the VRU dataset. For subsequent frames t (t > 0), our transformation triplane consists of 4-level 2D embeddings, whose resolutions range from 512 to 4096 and feature dimension is 4. The size of our one-channel saliency grid is 514 514 514. We implement our transformation tri-plane and saliency grid using binary hash encoding to reduce the storage cost. The maximum hash table size is 215. We train our BTC for 300 iterations and control the storage size of each frame by adjusting the weight parameter λt in the loss function (set to 0.004 by default).