Controllable Distortion-Perception Tradeoff Through Latent Diffusion for Neural Image Compression

Authors: Chuqin Zhou, Guo Lu, Jiangchuan Li, Xiangyu Chen, Zhengxue Cheng, Li Song, Wenjun Zhang

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experimental results demonstrate that our method significantly enhances the pretrained codecs with a wide, adjustable distortion-perception range while maintaining their original compression capabilities. For instance, we can achieve more than 150% improvement in LPIPS-BDRate without sacrificing more than 1 d B in PSNR. Extensive experiments demonstrate the effectiveness of our framework. We evaluate our method using both distortion metrics and perceptual quality metrics. All evaluations are performed on full-resolution images.
Researcher Affiliation Collaboration Chuqin Zhou1, Guo Lu1*, Jiangchuan Li1, Xiangyu Chen2, Zhengxue Cheng1, Li Song1, Wenjun Zhang1 1Shanghai Jiao Tong University 2Institute of Artificial Intelligence (Tele AI), China Telecom
Pseudocode No The paper includes architectural diagrams (Figures 1, 2, 3, 4) and mathematical equations, but no structured pseudocode or algorithm blocks are provided.
Open Source Code No The paper states: "We limit comparisons to studies with publicly available codes and models for consistent testing and evaluation." This refers to other works. There is no explicit statement or link indicating that the authors' own code for the described methodology is publicly available.
Open Datasets Yes For training, we use a high-quality Flickr2W dataset (Liu et al. 2020), and randomly crop images to a resolution of 256 256. The evaluations are conducted on two common image compression benchmark datasets: the CLIC2020 test set and the Kodak dataset.
Dataset Splits No For training, we use a high-quality Flickr2W dataset (Liu et al. 2020), and randomly crop images to a resolution of 256 256. The evaluations are conducted on two common image compression benchmark datasets: the CLIC2020 test set and the Kodak dataset. The paper mentions the datasets used for training and evaluation (test sets), but it does not provide specific training/validation/test splits for the Flickr2W dataset or specify how the benchmark datasets were used beyond evaluation.
Hardware Specification No The paper does not provide any specific details regarding the hardware used for running the experiments (e.g., GPU models, CPU types, or memory specifications).
Software Dependencies No The paper mentions "Adam W optimizer" but does not specify any software libraries or frameworks with their version numbers (e.g., Python, PyTorch, TensorFlow, CUDA versions) that would be needed for reproducibility.
Experiment Setup Yes For training, we use a high-quality Flickr2W dataset (Liu et al. 2020), and randomly crop images to a resolution of 256 256. To train the auxiliary encoder and the adaptive latent fusion module, we use the Adam W optimizer with a batch size of 32. The learning rate is maintained at a fixed value of 5 10 5.