DiffPC: Diffusion-based High Perceptual Fidelity Image Compression with Semantic Refinement
Authors: Yichong Xia, Yimin Zhou, Jinpeng Wang, Baoyi An, Haoqian Wang, Yaowei Wang, Bin Chen
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that our method achieves state-of-the-art perceptual fidelity and surpasses previous perceptual image compression methods by a significant margin in statistical fidelity. ... 4 EXPERIMENTS |
| Researcher Affiliation | Collaboration | Yichong Xia1,3, , Yimin Zhou1, , Jinpeng Wang1, Baoyi An4, Haoqian Wang1, Bin Chen2,3 1Tsinghua Shenzhen International Graduate School, 2Harbin Institute of Technology, Shenzhen 3Peng Cheng Laboratory, 4Huawei Technologies Company Ltd. |
| Pseudocode | Yes | Our Diff PC framework is illustrated as Figure 2 and the pseudocode for the algorithm can be found in Appendix A.2. ... Algorithm 1 Encoding Process ... Algorithm 2 Decoding Process |
| Open Source Code | Yes | Code is released at https://github.com/Darc8-sun/DIFFPC. |
| Open Datasets | Yes | For validation, we referenced (Muckley et al., 2023) and employed three widely recognized image compression benchmark datasets: CLIC2020 (George Toderici, 2020), DIV2K (Timofte et al., 2017), and Kodak (Company). ... Furthermore, following the approaches of (Hoogeboom et al., 2023; Careil et al., 2024), we validated the model s statistical fidelity using COCO30K (Lin et al., 2014) and present the results in the Appendix A.8. Our model was trained on the LSDIR dataset (Li et al., 2023b) |
| Dataset Splits | Yes | Our model was trained on the LSDIR dataset (Li et al., 2023b), which comprises 84,991 high-definition natural images. ... For validation, we referenced (Muckley et al., 2023) and employed three widely recognized image compression benchmark datasets: CLIC2020 (George Toderici, 2020), DIV2K (Timofte et al., 2017), and Kodak (Company). |
| Hardware Specification | Yes | Additionally, all experiments were conducted on an Nvidia A6000 GPU. ... except for Perco, all tests were conducted on the same Nvidia 3080ti GPU. Due to high VRAM usage during inference, Perco was tested on an A6000 GPU, which has superior GFLOPS. |
| Software Dependencies | No | Our foundational conditional diffusion model leverages Stable Diffusion 2.1-base3.. ... For LPIPS, we utilized the lpips library, while DISTS was implemented using DISTS pytorch. FID and KID metrics were calculated using functions provided by torchmetrics.image, with a feature size of 2048. |
| Experiment Setup | Yes | Our model was trained on the LSDIR dataset (Li et al., 2023b)... During training, these images were randomly cropped to a resolution of 512 x 512. Our foundational conditional diffusion model leverages Stable Diffusion 2.1-base3.. Throughout all training stages, we employed Adam W (Loshchilov, 2017) as the optimizer, with learning rates set at 1e-4 for the initial phase and 5e-5 for the subsequent phase. The batch size was consistently maintained at 2. In the initial training phase, we employed an entropy estimator SCCTX (He et al., 2022) with a group number of 3. To achieve compression at different bit rates, we set the parameter λ2 in Section 3.2 to 0.2 and then adjusted λ1 {4, 16, 64, 128}. At this stage, we will train with 80000 steps. In the second training phase, the parameters of the compressor were frozen. ... At this stage, we will train with 60000 steps. We did not apply warm-up in the first stage but utilized a Lambda Linear Scheduler with parameters warm up steps=10000 and f start=1e-6 in the second stage. For sampling, we utilized IDDPM (Nichol & Dhariwal, 2021) as the sampler with a uniform setting of 50 sampling steps... |