reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Instance-Adaptive Video Compression: Improving Neural Codecs by Training on the Test Set

Authors: Ties van Rozendaal, Johann Brehmer, Yunfan Zhang, Reza Pourreza, Auke J. Wiggers, Taco Cohen

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	On UVG, HEVC, and Xiph datasets, our codec improves the performance of a scale-space ﬂow model by between 21 % and 27 % BD-rate savings, and that of a state-of-the-art B-frame model by 17 to 20 % BD-rate savings. We also demonstrate that instanceadaptive ﬁnetuning improves the robustness to domain shift. Finally, our approach reduces the capacity requirements of compression models. We show that it enables a competitive performance even after reducing the network size by 70 %. Figures 1 show the rate-distortion curves of our instance-adaptive video codec (Inst A) as well as neural and traditional baselines. Both for SSF in the P-frame and B-EPIC in the B-frame setting, the instance-adaptive models clearly outperform the corresponding base models.
Researcher Affiliation	Industry	Ties van Rozendaal Johann Brehmer Yunfan Zhang Reza Pourreza Auke Wiggers Taco S. Cohen Qualcomm AI Research1 EMAIL 1Qualcomm AI Research is an initiative of Qualcomm Technologies, Inc.
Pseudocode	No	The paper describes the encoding procedure in a numbered list within section 3.4, but it is presented as natural language steps rather than structured pseudocode or an algorithm block. For example: "Our procedure follows van Rozendaal et al. (2021) and mainly diﬀers in the choice of hyper-parameters and application to video auto-encoder models. For completeness we shall describe the full method here. A video sequence x is compressed by: 1. Finetuning the model parameters (θ, φ) of the base model on the sequence x using Eq. (2), 2. computing the latent codes z qφ(z\|x), 3. parameterizing the ﬁnetuned decoder and prior parameters as updates δ = θ θD, 4. quantizing latent codes z as well as prior and decoder parameter updates δ, and 5. compressing the quantized latents z and updates δ with entropy coding to the bitstream."
Open Source Code	No	The paper states: "Finally, in the supplementary material we provide the performance of the various methods on each video sequence as a CSV ﬁle to aid comparisons." This refers to data, not source code. There is no explicit statement about releasing code or a link to a code repository for the methodology described.
Open Datasets	Yes	We use sequences from ﬁve diﬀerent datasets. The global models are trained on Vimeo90k (Xue et al., 2019). We evaluate on the HEVC class-B test sequences (HEVC, 2013), on the UVG-1k (Mercat et al., 2020) dataset, and on Xiph-5N (van Rozendaal et al., 2021), which entails ﬁve Xiph.org test sequences. The performance on out-of-distribution data is tested on two sequences from the animated short ﬁlm Big Buck Bunny, also part of the Xiph.org collection (Xiph.org).
Dataset Splits	Yes	The models are trained with a Go P size of 3 frames, which means that we split the training video into chunks of 3 frames and randomly sample chunks during training. We ﬁnally evaluate the models with a Go P size of 12. Further increasing the Go P size leads to diminishing returns in rate-distortion performance, as we demonstrate in Appendix D. For the B-EPIC model, we use the model trained by Pourreza & Cohen (2021) on Vimeo-90k. The setup is similar to that for the SSF models except that B-EPICs more complicated Go P structure requires training with a Go P size of 4 frames. At test time we use a Go P size of 12, the frame conﬁgurations are described in Appendix B. In the P-frame scenario we use a Go P size of 3 and ﬁnetune on full-resolution frames (1920 1080 pixels) with a batch size of 1 and a learning rate of 10 5. After ﬁnetuning, we transmit sequences with a Go P size of 12.
Hardware Specification	Yes	We report the walltime on machines with 40-core Intel Xeon Gold 6230 CPUs with 256 GB RAM and NVIDIA Tesla V100-SXM2 GPUs with 32 GB VRAM. We only use a single GPU.
Software Dependencies	Yes	We generate H.265 and H.264 results using version v3.4.8 of ﬀmpeg (FFmpeg). We are grateful to the authors and maintainers of ﬀmpeg (FFmpeg), Matplotlib (Hunter, 2007), Numpy (Charles R Harris et al., 2020), Open CV (Bradski, 2000), pandas (Mc Kinney, 2010), Python (Python core team, 2019), Py Torch (Paszke et al., 2017), scipy (Sci Py contributors, 2020), and seaborn (Waskom, 2021).
Experiment Setup	Yes	The scale-space ﬂow models described in Sec. 3 are trained with the MSE training setup described in Agustsson et al. (2020)... We ﬁrst train for 1 million steps on 256 256 crops with a learning rate of 10 4. We then conduct the MSE ﬁnetune stage of the training procedure from Agustsson et al. (2020) (not to be confused with instance-adaptive ﬁnetuning) for the SSF18 model, where we train on crops of size h w = 256 384 with a learning rate of 10 5. The models are trained with a Go P size of 3 frames... On each instance, we ﬁnetune the models with the Inst A objective in Eq. (2), using the same weight β as used to train the corresponding global model. We ﬁnetune for up to two weeks, corresponding to an average of 300 000 steps. In the P-frame scenario we use a Go P size of 3 and ﬁnetune on full-resolution frames (1920 1080 pixels) with a batch size of 1 and a learning rate of 10 5. To discretize the updates δ, we use a ﬁxed grid of n equal-sized bins of width t centered around δ = 0 and clip values at the tails. The quantization of z is analogous, except that we use a bin width of t = 1 and do not clip the values at the tails (in line with Ballé et al. (2018)). For the experiments in this paper we use a bin width t = 0.001, σ = 0.05, s = t/6, a spike-slab ratio α = 100, and a number of quantization bins of n = 289.