Ultra Lowrate Image Compression with Semantic Residual Coding and Compression-aware Diffusion

Authors: Anle Ke, Xu Zhang, Tong Chen, Ming Lu, Chao Zhou, Jiawen Gu, Zhan Ma

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate the effectiveness of Res ULIC, achieving superior objective and subjective performance compared to state-of-the-art diffusion-based methods with 80.7%, -66.3% BD-rate saving in terms of LPIPS and FID.
Researcher Affiliation Collaboration 1School of Electronic Science and Engineering, Nanjing University 2Kuaishou Technology. Correspondence to: Tong Chen <EMAIL>.
Pseudocode Yes Algorithm 1 Pfo: Perceptual fidelity optimization Algorithm 2 Compression-aware Diffusion (Sampling)
Open Source Code Yes Project page is available at https: //njuvision.github.io/Res ULIC/.
Open Datasets Yes For evaluation, we tested several widely used datasets including CLIC-2020 (Toderici et al., 2020) in the main paper, and Kodak (Kodak, 1993), DIV2K (Agustsson & Timofte, 2017), Tecnick (Asuni & Giachetti, 2014) and MSCOCO (Caesar et al., 2018) in Appendix A.
Dataset Splits No The paper states: "For evaluation on the CLIC2020, DIV2K, and Tecnick datasets, we followed the approach of CDC (Yang & Mandt, 2022) by resizing images to a short side of 768 and then center-cropping them to 768 768. For MSCOCO-3K, we randomly selected 3,000 images from the MSCOCO dataset and, following Per Co, resized them to 512 512 for testing." This describes evaluation setup but does not specify the train/test/validation splits used for the model training itself. It refers to using existing test sets and preprocessing for them, not explicit splits for their own data partitioning.
Hardware Specification Yes These data were all tested on the Kodak dataset using an RTX 4090 GPU.
Software Dependencies No The paper mentions "Stable diffusion v2.1 (Rombach et al., 2022) is used as the backbone diffusion model" and "the MLLM GPT4o (Open AI, 2024) is applied". While these are specific models/frameworks, the paper does not list specific library or solver names with version numbers (e.g., PyTorch version, Python version, CUDA version) for the ancillary software dependencies needed to replicate the experiment.
Experiment Setup Yes During optimization, we use the Adam W optimizer (Loshchilov & Hutter, 2017) with a learning rate set to 0.3. We balance the time cost and reconstruction quality by using 500 steps for optimization, which takes approximately 175s for a Kodak 768x512 image. In the first stage, we set λd and λp to 0, and λR to {24, 14, 4, 2, 1}, training for 150K iterations. In the second stage, we set λd to 1, λp to {0.4, 0.6, 0.8, 1, 1}, and λR to {36, 16, 6, 3, 1.5}, training for 100K iterations. During training, we center-crop images to a dimension of 512 512 and randomly set 30% of the captions to empty strings to enhance the model s generative capabilities and sensitivity to prompts.