DiffPC: Diffusion-based High Perceptual Fidelity Image Compression with Semantic Refinement

Authors: Yichong Xia, Yimin Zhou, Jinpeng Wang, Baoyi An, Haoqian Wang, Yaowei Wang, Bin Chen

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate that our method achieves state-of-the-art perceptual fidelity and surpasses previous perceptual image compression methods by a significant margin in statistical fidelity. ... 4 EXPERIMENTS
Researcher Affiliation Collaboration Yichong Xia1,3, , Yimin Zhou1, , Jinpeng Wang1, Baoyi An4, Haoqian Wang1, Bin Chen2,3 1Tsinghua Shenzhen International Graduate School, 2Harbin Institute of Technology, Shenzhen 3Peng Cheng Laboratory, 4Huawei Technologies Company Ltd.
Pseudocode Yes Our Diff PC framework is illustrated as Figure 2 and the pseudocode for the algorithm can be found in Appendix A.2. ... Algorithm 1 Encoding Process ... Algorithm 2 Decoding Process
Open Source Code Yes Code is released at https://github.com/Darc8-sun/DIFFPC.
Open Datasets Yes For validation, we referenced (Muckley et al., 2023) and employed three widely recognized image compression benchmark datasets: CLIC2020 (George Toderici, 2020), DIV2K (Timofte et al., 2017), and Kodak (Company). ... Furthermore, following the approaches of (Hoogeboom et al., 2023; Careil et al., 2024), we validated the model s statistical fidelity using COCO30K (Lin et al., 2014) and present the results in the Appendix A.8. Our model was trained on the LSDIR dataset (Li et al., 2023b)
Dataset Splits Yes Our model was trained on the LSDIR dataset (Li et al., 2023b), which comprises 84,991 high-definition natural images. ... For validation, we referenced (Muckley et al., 2023) and employed three widely recognized image compression benchmark datasets: CLIC2020 (George Toderici, 2020), DIV2K (Timofte et al., 2017), and Kodak (Company).
Hardware Specification Yes Additionally, all experiments were conducted on an Nvidia A6000 GPU. ... except for Perco, all tests were conducted on the same Nvidia 3080ti GPU. Due to high VRAM usage during inference, Perco was tested on an A6000 GPU, which has superior GFLOPS.
Software Dependencies No Our foundational conditional diffusion model leverages Stable Diffusion 2.1-base3.. ... For LPIPS, we utilized the lpips library, while DISTS was implemented using DISTS pytorch. FID and KID metrics were calculated using functions provided by torchmetrics.image, with a feature size of 2048.
Experiment Setup Yes Our model was trained on the LSDIR dataset (Li et al., 2023b)... During training, these images were randomly cropped to a resolution of 512 x 512. Our foundational conditional diffusion model leverages Stable Diffusion 2.1-base3.. Throughout all training stages, we employed Adam W (Loshchilov, 2017) as the optimizer, with learning rates set at 1e-4 for the initial phase and 5e-5 for the subsequent phase. The batch size was consistently maintained at 2. In the initial training phase, we employed an entropy estimator SCCTX (He et al., 2022) with a group number of 3. To achieve compression at different bit rates, we set the parameter λ2 in Section 3.2 to 0.2 and then adjusted λ1 {4, 16, 64, 128}. At this stage, we will train with 80000 steps. In the second training phase, the parameters of the compressor were frozen. ... At this stage, we will train with 60000 steps. We did not apply warm-up in the first stage but utilized a Lambda Linear Scheduler with parameters warm up steps=10000 and f start=1e-6 in the second stage. For sampling, we utilized IDDPM (Nichol & Dhariwal, 2021) as the sampler with a uniform setting of 50 sampling steps...