reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

A Mixture-Based Framework for Guiding Diffusion Models

Authors: Yazid Janati, Badr Moufad, Mehdi Abou El Qassime, Alain Oliviero Durmus, Eric Moulines, Jimmy Olsson

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We validate our approach through extensive experiments on image inverse problems, utilizing both pixeland latent-space diffusion priors, as well as on source separation with an audio diffusion model. MGDM demonstrates strong empirical performance across 10 image-restoration tasks involving both pixel-space and latent-space diffusion models, as well as in musical source separation, even matching the performance of supervised methods. We evaluate MGDM on image inverse problems using both pixel-space and latent-space diffusion, as well as on musical source separation tasks. For the pixel-space diffusion and the audio diffusion model, we compare MGDM against seven competitors. We report the LPIPS metric (Zhang et al., 2018) in Tables 1 and 2 and defer the complete tables with FID, PSNR and SSIM along side 95% conﬁdence interval to Table 6, Table 7, and Table 8. The SI-SDRI metric measures the improvement between the original audio source xi and the generated source ˆxi, relative to the mixture baseline y.
Researcher Affiliation	Academia	1Ecole polytechnique 2KTH Royal Institute of Technology. Correspondence to: Yazid Janati, Badr Moufad <ﬁrstEMAIL>.
Pseudocode	Yes	Algorithm 1 Gibbs sampler targeting (13) ... Algorithm 2 MIXTURE-GUIDED DIFFUSION MODEL ... Algorithm 3 Gauss_VI routine ... Algorithm 4 Gibbs sampler targeting (13)
Open Source Code	No	Our code will be made available upon acceptange of the paper.
Open Datasets	Yes	We evaluate our method on a diverse set of six linear inverse problems and four nonlinear inverse problems with three different image priors with 256 256 resolution: the pixel-space FFHQ model of Choi et al. (2021), the latentspace FFHQ of Rombach et al. (2022), and the Image Net model of Dhariwal & Nichol (2021). The evaluation is conducted on the publicly available slakh2100 test dataset (Manilow et al., 2019) with the scale-invariant SDR improvement (SI-SDRI) metric (Roux et al., 2019).
Dataset Splits	Yes	The evaluation is done on a subset of 300 validation images per dataset. For FFHQ, we use the ﬁrst 300 images, while for Image Net, we randomly sample 300 images to avoid class bias. We report the LPIPS metric (Zhang et al., 2018) in Tables 1 and 2 and defer the complete tables with FID, PSNR and SSIM along side 95% conﬁdence interval to Table 6, Table 7, and Table 8. For the phase retrieval task speciﬁcally, we draw 4 samples for each algorithm and keep only the best scoring one in terms of LPIPS. A similar strategy is used in (Chung et al., 2023; Zhang et al., 2024; Wu et al., 2024). [...] Tracks from the test dataset are evaluated using a sliding window approach with 4-second chunks and a 2-second overlap.
Hardware Specification	Yes	All experiments were conducted on Nvidia Tesla V100 SXM2 GPUs.
Software Dependencies	No	The denoiser network is based on a non-latent, time-domain unconditional variant of (Schneider et al., 2023). Its architecture follows a U-Net design, comprising an encoder, bottleneck, and decoder. Training is performed on the four stacked instruments using the publicly available trainer from repository2. (footnote 2: https://github.com/archinetai/audio-diffusion-pytorch-trainer) ... In the anonymous codebase provided as companion of the paper we use αt instead of αt to match the conventions of existing codebases.
Experiment Setup	Yes	The details about the hyperparameters of MGDM are reported in Table 5. We adjust the optimization of the Gaussian Variational approximation in Algorithm 3 during the ﬁrst and last diffusion steps. We ramp up the number of gradient steps during the ﬁnal diffusion steps. This allows us to substantially improve the ﬁne grained details of the reconstructions. Similarly, we reduce the learning rate in the early step to alleviate potential instabilities. We tune the parameters of our algorithm per dataset and not per task.