On Quantizing Neural Representation for Variable-Rate Video Coding

Authors: Junqi Shi, Zhujia Chen, Hanfei Li, Qi Zhao, Ming Lu, Tong Chen, Zhan Ma

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experimental evaluations demonstrate that Neuro Quant significantly outperforms existing techniques in varying bitwidth quantization and compression efficiency, accelerating encoding by up to eight times and enabling quantization down to INT2 with minimal reconstruction loss.
Researcher Affiliation Academia Junqi Shi, Zhujia Chen, Hanfei Li, Qi Zhao, Ming Lu , Tong Chen, Zhan Ma School of Electronic Science and Engineering, Nanjing University EMAIL, EMAIL
Pseudocode Yes Algorithm 1 Bit Allocation for Target Bitrate (Mixed Precision, Sec. 3.1) Algorithm 2 Encoding for Target Rate (Calibration, Sec. 3.2)
Open Source Code Yes The materials will be available at https://github.com/Eric-qi/Neuro Quant.
Open Datasets Yes We conducted experiments on the UVG dataset2, which consists of 7 videos, each with a resolution of 1920 × 1080 and recorded at 120 FPS over 5 or 2.5 seconds. We applied a center crop to achieve a resolution of 1920 × 960, similar to the preprocessing used in HNe RV and Ne RV.
Dataset Splits No No explicit training/validation/test splits are provided for the video-specific model training. The paper states, "INR-VC encodes each video as a unique neural network through end-to-end training" and for evaluation, "all frames of each video were evaluated." This describes the evaluation process but not how the frames within each video are partitioned for training and validation of these unique networks.
Hardware Specification Yes All experiments were conducted using Pytorch with Nvidia RTX 3090 GPUs.
Software Dependencies No The paper mentions using "Pytorch" and the "Adam optimizer" but does not specify their version numbers or versions for other libraries or tools.
Experiment Setup Yes Once the bits are allocated, we employed the Adam optimizer (Kingma, 2014) to calibrate quantization parameters (e.g, quantization steps, weight rounding) to minimize distortion. For frame-wise INR-VC systems like Ne RV and HNe RV, the batchsize was set to 2, while for patch-wise INR-VC systems like Hi Ne RV, the batchsize was set to 144. The learning rate was set to 3e-3 with a cosine annealing strategy. QP were be optimized for 2.1 x 10^4 iterations, although most cases converged in fewer than 1.5 x 10^4 iterations.