Controllable Distortion-Perception Tradeoff Through Latent Diffusion for Neural Image Compression
Authors: Chuqin Zhou, Guo Lu, Jiangchuan Li, Xiangyu Chen, Zhengxue Cheng, Li Song, Wenjun Zhang
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experimental results demonstrate that our method significantly enhances the pretrained codecs with a wide, adjustable distortion-perception range while maintaining their original compression capabilities. For instance, we can achieve more than 150% improvement in LPIPS-BDRate without sacrificing more than 1 d B in PSNR. Extensive experiments demonstrate the effectiveness of our framework. We evaluate our method using both distortion metrics and perceptual quality metrics. All evaluations are performed on full-resolution images. |
| Researcher Affiliation | Collaboration | Chuqin Zhou1, Guo Lu1*, Jiangchuan Li1, Xiangyu Chen2, Zhengxue Cheng1, Li Song1, Wenjun Zhang1 1Shanghai Jiao Tong University 2Institute of Artificial Intelligence (Tele AI), China Telecom |
| Pseudocode | No | The paper includes architectural diagrams (Figures 1, 2, 3, 4) and mathematical equations, but no structured pseudocode or algorithm blocks are provided. |
| Open Source Code | No | The paper states: "We limit comparisons to studies with publicly available codes and models for consistent testing and evaluation." This refers to other works. There is no explicit statement or link indicating that the authors' own code for the described methodology is publicly available. |
| Open Datasets | Yes | For training, we use a high-quality Flickr2W dataset (Liu et al. 2020), and randomly crop images to a resolution of 256 256. The evaluations are conducted on two common image compression benchmark datasets: the CLIC2020 test set and the Kodak dataset. |
| Dataset Splits | No | For training, we use a high-quality Flickr2W dataset (Liu et al. 2020), and randomly crop images to a resolution of 256 256. The evaluations are conducted on two common image compression benchmark datasets: the CLIC2020 test set and the Kodak dataset. The paper mentions the datasets used for training and evaluation (test sets), but it does not provide specific training/validation/test splits for the Flickr2W dataset or specify how the benchmark datasets were used beyond evaluation. |
| Hardware Specification | No | The paper does not provide any specific details regarding the hardware used for running the experiments (e.g., GPU models, CPU types, or memory specifications). |
| Software Dependencies | No | The paper mentions "Adam W optimizer" but does not specify any software libraries or frameworks with their version numbers (e.g., Python, PyTorch, TensorFlow, CUDA versions) that would be needed for reproducibility. |
| Experiment Setup | Yes | For training, we use a high-quality Flickr2W dataset (Liu et al. 2020), and randomly crop images to a resolution of 256 256. To train the auxiliary encoder and the adaptive latent fusion module, we use the Adam W optimizer with a batch size of 32. The learning rate is maintained at a fixed value of 5 10 5. |