reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

CodecNeRF: Toward Fast Encoding and Decoding, Compact, and High-quality Novel-view Synthesis

Authors: Gyeongjin Kang, Younggeun Lee, Seungjun Oh, Eunbyung Park

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this work, we present Codec Ne RF, a neural codec for Ne RF representations, consisting of an encoder and decoder architecture that can generate a Ne RF representation in a single forward pass. Furthermore, inspired by the recent parameter-efficient finetuning approaches, we propose a finetuning method to efficiently adapt the generated Ne RF representations to a new test instance, leading to highquality image renderings and compact code sizes. The proposed Codec Ne RF, a newly suggested encoding-decodingfinetuning pipeline for Ne RF, achieved unprecedented compression performance of more than 100 and a remarkable reduction in encoding time while maintaining (or improving) the image quality on widely used 3D object datasets. ... We have conducted comprehensive experiments using two representative 3D datasets, Objaverse (Deitke et al. 2023) and Google Scanned Objects (Downs et al. 2022). The experimental results show that the proposed encoderdecoder finetuning method, Codec Ne RF, achieved 100 more compression performance and significantly reduced encoding (training) time over the per-scene optimization baseline methods while maintaining the rendered image quality.
Researcher Affiliation	Academia	Gyeongjin Kang1, Younggeun Lee2, Seungjun Oh2, Eunbyung Park1, 2 1Department of Electrical and Computer Engineering, Sungkyunkwan University 2Department of Artificial Intelligence, Sungkyunkwan University EMAIL
Pseudocode	No	The paper describes the proposed architecture and methods in Section 3 and its subsections, along with supporting figures (Figure 1 and Figure 2). However, it does not contain any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code	Yes	Code https://gynjn.github.io/Codec Ne RF
Open Datasets	Yes	To evaluate our method, we conduct experiments on 1) Objaverse (Deitke et al. 2023) and Google Scanned Objects (GSO) (Downs et al. 2022) for object-level novel view synthesis, and 2) DTU dataset (Jensen et al. 2014) for real scenes.
Dataset Splits	Yes	For Objaverse, we sourced images from One-2-345 (Liu et al. 2024), which consists of 46k objects, and constructed our own split of 36,796 training objects and 9,118 test objects. In GSO, we used 1,030 objects only for the evaluation. Lastly, we followed Pixel Ne RF (Yu et al. 2021) DTU dataset split with 88 training scenes and 15 testing scenes. ... We finetuned the model using 24 images and tested it on the remaining views. Note that the 16 images used to generate the initial triplane representations are a subset of the 24 training images, and the same images are all used to train baselines for a fair comparison. For the DTU dataset, we choose 8 input images for base model training and 16 images for finetuning.
Hardware Specification	No	The paper provides details on model architecture and hyperparameters (e.g., resolutions, channel sizes, number of layers, codebook size, LoRA rank) in Section 4.2. However, it does not explicitly specify the hardware used for running the experiments, such as specific GPU models, CPU types, or memory.
Software Dependencies	No	The paper describes various architectural components and parameters in Section 4.2, such as using a pre-trained visual transformer (ViT) and K-planes. However, it does not specify exact version numbers for programming languages (e.g., Python), deep learning frameworks (e.g., PyTorch, TensorFlow), or other libraries/solvers.
Experiment Setup	Yes	To train our base model, we randomly choose 16 input images and camera poses to produce a triplane representation and predict the remaining novel views. We used two spatial resolutions {V1, V2} = {64, 128} and channel size C = 32 for our multi-resolution triplanes. The MLP decoders are of 6 layers with hidden dimensions 64 for coarse and fine decoders, respectively. We set the codebook size K = 8192 and dimension C = 32. In the finetuning stage, we first generated an initial representation using predetermined 16 view indices and finetuned the triplane features with MLP decoders in an optimization-based approach. We finetuned the model using 24 images and tested it on the remaining views. ... For the DTU dataset, we choose 8 input images for base model training and 16 images for finetuning. We set the Lo RA s rank to 4 in every layer of the decoder.