Progressive Compression with Universally Quantized Diffusion Models

Authors: Yibo Yang, Justus Will, Stephan Mandt

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We obtain promising first results on image compression, achieving competitive rate-distortion and rate-realism results on a wide range of bit-rates with a single model, bringing neural codecs a step closer to practical deployment. Our code can be found at https://github.com/mandt-lab/uqdm. 5 EXPERIMENTS We train UQDM end-to-end by directly optimizing the NELBO loss eq. (1), summing up Lt across all time steps.
Researcher Affiliation Academia Yibo Yang Justus C. Will Stephan Mandt Department of Computer Science University of California, Irvine EMAIL
Pseudocode Yes Algorithm 1 Encoding z T p(z T ) for t = T, . . . , 2, 1 do ... Algorithm 2 Decoding z T p(z T ) Using shared seed for t = T, . . . , 2, 1 do ...
Open Source Code Yes Our code can be found at https://github.com/mandt-lab/uqdm. ... Our code can be found at https://github.com/ mandt-lab/uqdm.
Open Datasets Yes We start with the CIFAR10 dataset containing 32 32 images. ... Finally, we present results on the Image Net 64 64 dataset. ... We obtain initial insights into the behavior of our proposed UQDM by experimenting on toy swirl data (see Appendix C.1 for details) and comparing with the hypothetical performance of VDM (Kingma et al., 2021). ... We use the swirl data from the codebase of (Kingma et al., 2021)
Dataset Splits Yes We start with the CIFAR10 dataset containing 32 32 images. ... Finally, we present results on the Image Net 64 64 dataset. ... We use the swirl data from the codebase of (Kingma et al., 2021)
Hardware Specification Yes Around 0.6 s for encoding and 0.5 s for decoding on Intel(R) Xeon(R) Gold 5218 CPU @ 2.30GHz CPU; 0.5 s for encoding and 0.3 s for decoding on a single Quadro RTX 8000 GPU.
Software Dependencies No We implemented the progressive codec using tensorflow-compression (Ball e et al.), and found the actual file size to be within 3% of the theoretical NELBO. ... Thus we expect the coding speed to be dramatically faster with a parallel implementation of entropy coding, e.g., using the Diet GPU4 library.
Experiment Setup Yes We found a small T (< 10) to give the best compression performance... For our UQDM model we empirically find that T 4 yields the best trade-off between bit-rate and reconstruction quality. ... We use the noise schedule σ2 t = σ(γt) where γt is linear in t with learned endpoints γT and γ0. ... We use a U-Net of depth 8, consisting of 8 Res Net blocks in the forward direction and 9 Res Net blocks in the reverse direction, with a single attention layer and two additional Res Net blocks in the middle. We keep the number of channels constant throughout at 128.