Approaching Rate-Distortion Limits in Neural Compression with Lattice Transform Coding

Authors: Eric Lei, Hamed Hassani, Shirin Saeedi Bidokhti

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We first discuss LTC on synthetic sources, before moving to real-world sources. Then, we demonstrate how LTC performs with nested lattices, and finally present results on block coding using BLTC. In addition, we provide an ablation study demonstrating the effects of various components in LTC.
Researcher Affiliation Academia Eric Lei, Hamed Hassani & Shirin Saeedi Bidokhti Department of Electrical and Systems Engineering, University of Pennsylvania EMAIL
Pseudocode No The paper describes methods and mathematical formulations but does not contain a clearly labeled pseudocode or algorithm block.
Open Source Code Yes Code can be found at https://github.com/leieric/lattice-transform-coding.
Open Datasets Yes For Physics and Speech, we map x to a latent space with dimension dy using MLP-based transforms. For these sources, we found training instabilities with STE, so dithering is used instead. At higher rates, we also observed stabler training with the factorized density model compared to the normalizing flow. For Speech, we observe coding gain for rates above 7 bits with the D 4 and E8 lattices, shown in Fig. 7. Coding gain is observed on Physics for LTC in different rate regimes. LTC with the hexagonal lattice performs best at lower rates, but the performance drops off as rate increases. At higher rates, LTC with 4 latent dimensions performs the best with D 4 lattice. These rates are measured in bits-per-sample, and not normalized by dimension. Generally speaking, different rate regimes may require a different number of latent dimensions to be used for NTC and LTC. If the number of latent dimensions used by the transforms does not match the lattice quantizer dimension, this can result in suboptimal performance as the tessellation in source space may not be optimal. This is likely why LTC performance peaks for different latent dimensions. For large-scale images, we focus on the NTC architecture in Cheng et al. (2020), and use product E8 and Leech Λ24 lattices along the channel dimension of the latent. We use dither during training and STE at test time, and train on the Vimeo-90k (Xue et al., 2019) dataset. We use the identical hyperparameter settings for transforms and entropy models, corresponding to the first 4 quality levels for the cheng2020-attn model in Compress AI (Bégaint et al., 2020). Shown in Fig. 7, LTC with product E8 and product Leech achieves a -5.274% and -16.708% BD-rate gain respectively over NTC when evaluated on Kodak.
Dataset Splits No For all synthetic sources, we sample 10^7 samples as our training dataset. Finally, the Physics and Speech sources are taken from https://github.com/mandt-lab/RD-sandwich. The paper mentions a training dataset size for synthetic sources and uses several real-world datasets but does not explicitly provide information on training, validation, or test splits, or how they were performed.
Hardware Specification Yes On NVIDIA RTX 5000 GPUs with 16 GB memory, LTC training until convergence took at most a few hours for the Speech and Physics sources, and minutes for the i.i.d. scalar sequences.
Software Dependencies No The paper mentions using specific software for datasets and models (e.g., "Tensorflow Compression library (Ballé et al., 2024)", "Compress AI (Bégaint et al., 2020)") but does not provide specific version numbers for any software dependencies like Python, PyTorch, or CUDA.
Experiment Setup Yes In all experiments, we use a batch size of 64 and train until convergence using Adam. For the i.i.d. scalar sequences, we found that using CELU nonlinearities with no biases in the transforms sometimes helped improve training stability. For synthetic sources, Speech and Physics, the rate-distortion Lagrange multiplier λ is swept over 0.5, 1, 1.5, 2, 4, 8. For images, we use the default λ values in Bégaint et al. (2020). In addition, for images, a product lattice is applied along the channel dimension of the latent tensor, which has shape C H W. There are C/n lattices applied H W times, where n is the lattice dimension.