Progressive Compression with Universally Quantized Diffusion Models
Authors: Yibo Yang, Justus Will, Stephan Mandt
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We obtain promising first results on image compression, achieving competitive rate-distortion and rate-realism results on a wide range of bit-rates with a single model, bringing neural codecs a step closer to practical deployment. Our code can be found at https://github.com/mandt-lab/uqdm. 5 EXPERIMENTS We train UQDM end-to-end by directly optimizing the NELBO loss eq. (1), summing up Lt across all time steps. |
| Researcher Affiliation | Academia | Yibo Yang Justus C. Will Stephan Mandt Department of Computer Science University of California, Irvine EMAIL |
| Pseudocode | Yes | Algorithm 1 Encoding z T p(z T ) for t = T, . . . , 2, 1 do ... Algorithm 2 Decoding z T p(z T ) Using shared seed for t = T, . . . , 2, 1 do ... |
| Open Source Code | Yes | Our code can be found at https://github.com/mandt-lab/uqdm. ... Our code can be found at https://github.com/ mandt-lab/uqdm. |
| Open Datasets | Yes | We start with the CIFAR10 dataset containing 32 32 images. ... Finally, we present results on the Image Net 64 64 dataset. ... We obtain initial insights into the behavior of our proposed UQDM by experimenting on toy swirl data (see Appendix C.1 for details) and comparing with the hypothetical performance of VDM (Kingma et al., 2021). ... We use the swirl data from the codebase of (Kingma et al., 2021) |
| Dataset Splits | Yes | We start with the CIFAR10 dataset containing 32 32 images. ... Finally, we present results on the Image Net 64 64 dataset. ... We use the swirl data from the codebase of (Kingma et al., 2021) |
| Hardware Specification | Yes | Around 0.6 s for encoding and 0.5 s for decoding on Intel(R) Xeon(R) Gold 5218 CPU @ 2.30GHz CPU; 0.5 s for encoding and 0.3 s for decoding on a single Quadro RTX 8000 GPU. |
| Software Dependencies | No | We implemented the progressive codec using tensorflow-compression (Ball e et al.), and found the actual file size to be within 3% of the theoretical NELBO. ... Thus we expect the coding speed to be dramatically faster with a parallel implementation of entropy coding, e.g., using the Diet GPU4 library. |
| Experiment Setup | Yes | We found a small T (< 10) to give the best compression performance... For our UQDM model we empirically find that T 4 yields the best trade-off between bit-rate and reconstruction quality. ... We use the noise schedule σ2 t = σ(γt) where γt is linear in t with learned endpoints γT and γ0. ... We use a U-Net of depth 8, consisting of 8 Res Net blocks in the forward direction and 9 Res Net blocks in the reverse direction, with a single attention layer and two additional Res Net blocks in the middle. We keep the number of channels constant throughout at 128. |