Ultra Lowrate Image Compression with Semantic Residual Coding and Compression-aware Diffusion
Authors: Anle Ke, Xu Zhang, Tong Chen, Ming Lu, Chao Zhou, Jiawen Gu, Zhan Ma
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate the effectiveness of Res ULIC, achieving superior objective and subjective performance compared to state-of-the-art diffusion-based methods with 80.7%, -66.3% BD-rate saving in terms of LPIPS and FID. |
| Researcher Affiliation | Collaboration | 1School of Electronic Science and Engineering, Nanjing University 2Kuaishou Technology. Correspondence to: Tong Chen <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Pfo: Perceptual fidelity optimization Algorithm 2 Compression-aware Diffusion (Sampling) |
| Open Source Code | Yes | Project page is available at https: //njuvision.github.io/Res ULIC/. |
| Open Datasets | Yes | For evaluation, we tested several widely used datasets including CLIC-2020 (Toderici et al., 2020) in the main paper, and Kodak (Kodak, 1993), DIV2K (Agustsson & Timofte, 2017), Tecnick (Asuni & Giachetti, 2014) and MSCOCO (Caesar et al., 2018) in Appendix A. |
| Dataset Splits | No | The paper states: "For evaluation on the CLIC2020, DIV2K, and Tecnick datasets, we followed the approach of CDC (Yang & Mandt, 2022) by resizing images to a short side of 768 and then center-cropping them to 768 768. For MSCOCO-3K, we randomly selected 3,000 images from the MSCOCO dataset and, following Per Co, resized them to 512 512 for testing." This describes evaluation setup but does not specify the train/test/validation splits used for the model training itself. It refers to using existing test sets and preprocessing for them, not explicit splits for their own data partitioning. |
| Hardware Specification | Yes | These data were all tested on the Kodak dataset using an RTX 4090 GPU. |
| Software Dependencies | No | The paper mentions "Stable diffusion v2.1 (Rombach et al., 2022) is used as the backbone diffusion model" and "the MLLM GPT4o (Open AI, 2024) is applied". While these are specific models/frameworks, the paper does not list specific library or solver names with version numbers (e.g., PyTorch version, Python version, CUDA version) for the ancillary software dependencies needed to replicate the experiment. |
| Experiment Setup | Yes | During optimization, we use the Adam W optimizer (Loshchilov & Hutter, 2017) with a learning rate set to 0.3. We balance the time cost and reconstruction quality by using 500 steps for optimization, which takes approximately 175s for a Kodak 768x512 image. In the first stage, we set λd and λp to 0, and λR to {24, 14, 4, 2, 1}, training for 150K iterations. In the second stage, we set λd to 1, λp to {0.4, 0.6, 0.8, 1, 1}, and λR to {36, 16, 6, 3, 1.5}, training for 100K iterations. During training, we center-crop images to a dimension of 512 512 and randomly set 30% of the captions to empty strings to enhance the model s generative capabilities and sensitivity to prompts. |