Compressed Image Generation with Denoising Diffusion Codebook Models

Authors: Guy Ohayon, Hila Manor, Tomer Michaeli, Michael Elad

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments. While DDPM is equivalent to DDCM with K = , the first question we address is whether DDCM maintains the synthesis capabilities of DDPM for relatively small K values. We compare the performance of DDPM with that of DDCM using K {2, 4, 8, 16, 64} for sampling from pre-trained pixel and latent space models. We compute the Fr echet Inception Distance (FID) (Heusel et al., 2017) to evaluate the generation performance. In App. A we report additional metrics and provide qualitative comparisons.
Researcher Affiliation Academia 1Faculty of Computer Science, Technion Israel Institute of Technology, Haifa, Israel 2Faculty of Electrical and Computer Engineering, Technion Israel Institute of Technology, Haifa, Israel. Correspondence to: Guy Ohayon <EMAIL>, Hila Manor <EMAIL>.
Pseudocode No The paper describes the methods using mathematical equations and textual explanations, but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code Yes Code and demo are available on our project s website.
Open Datasets Yes For pixel space generation, we use a class-conditional DDM trained on Image Net 256 256 (Deng et al., 2009; Dhariwal & Nichol, 2021)... As the reference dataset, we randomly select 10k images from MS-COCO (Lin et al., 2014; Chen et al., 2015)... We evaluate our compression method on Kodak24 (Franzen, 1999), DIV2K (Agustsson & Timofte, 2017), Image Net 1K 256 256 (Deng et al., 2009; Pan et al., 2020), and CLIC2020 (Toderici et al., 2020).
Dataset Splits Yes For pixel space generation, we use a class-conditional DDM trained on Image Net 256 256 (Deng et al., 2009; Dhariwal & Nichol, 2021), and apply classifier guidance (CG) (Dhariwal & Nichol, 2021) with unit scale. We use the 50k validation set of Image Net as the reference dataset, and sample 10k class labels randomly to generate the images. For latent space, we use Stable Diffusion (SD) 2.1 (Rombach et al., 2022) trained on 768 768 images and apply classifier-free guidance (CFG) with scale 3 (equivalent to w = 2 in (Ho & Salimans, 2021)). As the reference dataset, we randomly select 10k images from MS-COCO (Lin et al., 2014; Chen et al., 2015) along with one caption per image, and use those captions as prompts for sampling. ... We compare our approach against the state-of-the-art methods PMRF (Ohayon et al., 2025), Dif Face (Yue & Loy, 2024), and BFRffusion (Chen et al., 2024b), using the standard evaluation datasets Celeb A-Test (Karras et al., 2018; Wang et al., 2021), LFW-Test (Huang et al., 2008), Web Photo Test (Wang et al., 2021), and WIDER-Test (Zhou et al., 2022).
Hardware Specification No The paper does not provide specific details about the hardware used for running the experiments, such as GPU models or CPU types.
Software Dependencies Yes We use a class-conditional Image Net model (256 256) for pixel space, and the text-conditional SD 2.1 model (768 768) for latent space... We use Torch Fidelity (Obukhov et al., 2020) to compute the perceptual quality measures. ... using the Salesforce/blip2-opt-2.7b-coco checkpoint of BLIP-2 from Hugging Face. ... with the Open AI CLIP Vi T-L/14 model (Radford et al., 2021).
Experiment Setup Yes We compare the performance of DDPM with that of DDCM using K {2, 4, 8, 16, 64} for sampling from pre-trained pixel and latent space models. ... We use a class-conditional Image Net model (256 256)... and apply classifier guidance (CG) (Dhariwal & Nichol, 2021) with unit scale. ... Stable Diffusion (SD) 2.1 ... and apply classifier-free guidance (CFG) with scale 3 (equivalent to w = 2 in (Ho & Salimans, 2021)). ... We use the FFHQ 512 512 DDM of Yue & Loy (2024) with T = 1000 sampling steps and K = 4096 for all codebooks.