CodecNeRF: Toward Fast Encoding and Decoding, Compact, and High-quality Novel-view Synthesis
Authors: Gyeongjin Kang, Younggeun Lee, Seungjun Oh, Eunbyung Park
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this work, we present Codec Ne RF, a neural codec for Ne RF representations, consisting of an encoder and decoder architecture that can generate a Ne RF representation in a single forward pass. Furthermore, inspired by the recent parameter-efficient finetuning approaches, we propose a finetuning method to efficiently adapt the generated Ne RF representations to a new test instance, leading to highquality image renderings and compact code sizes. The proposed Codec Ne RF, a newly suggested encoding-decodingfinetuning pipeline for Ne RF, achieved unprecedented compression performance of more than 100 and a remarkable reduction in encoding time while maintaining (or improving) the image quality on widely used 3D object datasets. ... We have conducted comprehensive experiments using two representative 3D datasets, Objaverse (Deitke et al. 2023) and Google Scanned Objects (Downs et al. 2022). The experimental results show that the proposed encoderdecoder finetuning method, Codec Ne RF, achieved 100 more compression performance and significantly reduced encoding (training) time over the per-scene optimization baseline methods while maintaining the rendered image quality. |
| Researcher Affiliation | Academia | Gyeongjin Kang1*, Younggeun Lee2*, Seungjun Oh2, Eunbyung Park1, 2 1Department of Electrical and Computer Engineering, Sungkyunkwan University 2Department of Artificial Intelligence, Sungkyunkwan University EMAIL |
| Pseudocode | No | The paper describes the proposed architecture and methods in Section 3 and its subsections, along with supporting figures (Figure 1 and Figure 2). However, it does not contain any explicitly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | Yes | Code https://gynjn.github.io/Codec Ne RF |
| Open Datasets | Yes | To evaluate our method, we conduct experiments on 1) Objaverse (Deitke et al. 2023) and Google Scanned Objects (GSO) (Downs et al. 2022) for object-level novel view synthesis, and 2) DTU dataset (Jensen et al. 2014) for real scenes. |
| Dataset Splits | Yes | For Objaverse, we sourced images from One-2-345 (Liu et al. 2024), which consists of 46k objects, and constructed our own split of 36,796 training objects and 9,118 test objects. In GSO, we used 1,030 objects only for the evaluation. Lastly, we followed Pixel Ne RF (Yu et al. 2021) DTU dataset split with 88 training scenes and 15 testing scenes. ... We finetuned the model using 24 images and tested it on the remaining views. Note that the 16 images used to generate the initial triplane representations are a subset of the 24 training images, and the same images are all used to train baselines for a fair comparison. For the DTU dataset, we choose 8 input images for base model training and 16 images for finetuning. |
| Hardware Specification | No | The paper provides details on model architecture and hyperparameters (e.g., resolutions, channel sizes, number of layers, codebook size, LoRA rank) in Section 4.2. However, it does not explicitly specify the hardware used for running the experiments, such as specific GPU models, CPU types, or memory. |
| Software Dependencies | No | The paper describes various architectural components and parameters in Section 4.2, such as using a pre-trained visual transformer (ViT) and K-planes. However, it does not specify exact version numbers for programming languages (e.g., Python), deep learning frameworks (e.g., PyTorch, TensorFlow), or other libraries/solvers. |
| Experiment Setup | Yes | To train our base model, we randomly choose 16 input images and camera poses to produce a triplane representation and predict the remaining novel views. We used two spatial resolutions {V1, V2} = {64, 128} and channel size C = 32 for our multi-resolution triplanes. The MLP decoders are of 6 layers with hidden dimensions 64 for coarse and fine decoders, respectively. We set the codebook size K = 8192 and dimension C = 32. In the finetuning stage, we first generated an initial representation using predetermined 16 view indices and finetuned the triplane features with MLP decoders in an optimization-based approach. We finetuned the model using 24 images and tested it on the remaining views. ... For the DTU dataset, we choose 8 input images for base model training and 16 images for finetuning. We set the Lo RA s rank to 4 in every layer of the decoder. |