Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]
Compressing Streamable Free-Viewpoint Videos to 0.1 MB per Frame
Authors: Luyang Tang, Jiayu Yang, Rui Peng, Yongqi Zhai, Shihe Shen, Ronggang Wang
AAAI 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on widely used datasets demonstrate the state-of-the-art performance of our framework in both synthesis quality and efficiency, i.e., achieving per-frame training in 13 seconds with a storage cost of 0.1 MB and real-time rendering at 120 FPS. |
| Researcher Affiliation | Collaboration | 1Guangdong Provincial Key Laboratory of Ultra High Definition Immersive Media Technology, Shenzhen Graduate School, Peking University, China 2Pengcheng Laboratory, China |
| Pseudocode | No | The paper describes the methodology using prose and mathematical equations but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code https://github.com/Pomelomm/i FVC |
| Open Datasets | Yes | We conduct experiments on three real-world dynamic scene datasets as follows: Neural 3D Video (N3DV) (Li et al. 2022b) ... Meet Room (Li et al. 2022a) ... VRU Basketball Game |
| Dataset Splits | Yes | Following prior works (Sun et al. 2024), we downsample the original videos by a factor of two for training and testing. [...] we utilize 12 views for training and reserve 1 for testing. [...] We utilize 30 views for training and reserve 4 for testing. |
| Hardware Specification | No | The paper does not explicitly mention specific hardware details such as GPU models, CPU types, or memory used for running experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers, such as programming language versions or library versions, needed to replicate the experiment. |
| Experiment Setup | Yes | The whole framework starts with sparse points from Sf M at timestep 0. To obtain a high-quality and compact initial representation, we train for 15K steps on the N3DV and Meet Room datasets, and for 30K steps on the VRU dataset. For subsequent frames t (t > 0), our transformation triplane consists of 4-level 2D embeddings, whose resolutions range from 512 to 4096 and feature dimension is 4. The size of our one-channel saliency grid is 514 514 514. We implement our transformation tri-plane and saliency grid using binary hash encoding to reduce the storage cost. The maximum hash table size is 215. We train our BTC for 300 iterations and control the storage size of each frame by adjusting the weight parameter λt in the loss function (set to 0.004 by default). |