On the choice of Perception Loss Function for Learned Video Compression

Authors: Sadaf Salehkalaibar, Truong Buu Phan, Jun Chen, Wei Yu, Ashish Khisti

NeurIPS 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Using information theoretic analysis and deep-learning based experiments, we demonstrate that the choice of PLF can have a significant effect on the reconstruction, especially at low-bit rates. ... We validate our results using (one-shot) information-theoretic analysis, detailed study of the rate-distortion-perception tradeoff of the Gauss-Markov source model as well as deep-learning based experiments on moving MNIST and KTH datasets.
Researcher Affiliation Academia Sadaf Salehkalaibar ECE Department University of Toronto EMAIL Buu Phan* ECE Department University of Toronto EMAIL Jun Chen ECE Department Mc Master University EMAIL Wei Yu ECE Department University of Toronto EMAIL Ashish Khisti ECE Department University of Toronto EMAIL
Pseudocode No The paper does not contain any explicitly labeled pseudocode or algorithm blocks.
Open Source Code No Code will be available at https://github.com/truongbuu/URDP_flow.
Open Datasets Yes We validate our results using ... deep-learning based experiments on moving MNIST and KTH datasets. ... Moving MNIST dataset [29] (with 1 digit) using Wasserstein GAN [30] ... Additional results on the KTH dataset [31] are available in Appendix J.3.
Dataset Splits No The paper mentions 'training set contains 60000 images' but does not provide specific train/validation/test splits or a clear splitting methodology.
Hardware Specification Yes Training takes 2 days per model on a single NVIDIA P100 GPU.
Software Dependencies No The paper mentions software like 'Wasserstein GAN', 'scale-space flow model', and 'conditional module', and 'WGAN-GP framework', but does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes We use a batch size of 64, RMSProp optimizer with a learning rate of 5 10 5, and train each model with 360 epochs, where the training set contains 60000 images. ... Under WGAN-GP framework [30], we use the gradient penalty of 10 and update the encoders/decoders for every 5 iterations. The parameters λ controlling the tradeoff are in Table.7.