Repulsive Latent Score Distillation for Solving Inverse Problems

Authors: Nicolas Zilberstein, Morteza Mardani, Santiago Segarra

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on linear and nonlinear inverse tasks with high-resolution images (512 512) using pre-trained Stable Diffusion models demonstrate the effectiveness of our approach.
Researcher Affiliation Collaboration Nicolas Zilberstein Rice University EMAIL Morteza Mardani NVIDIA Inc. EMAIL Santiago Segarra Rice University EMAIL
Pseudocode Yes The algorithm is shown in Algorithm 1; we define sg[.] as stopped-gradient operator to emphasize that the term inside it is not differentiated during the optimization step. Non Aug-RLSD. This method corresponds to our variant using the particle-based variational approximation (8) and without augmentation. For clarity, we show it in Alg. 2.
Open Source Code Yes The code is available at Git Hub. To facilitate reproducibility, we share an anonymous link of our source codehttps://file.io/ i QNq3U5Gps Y6. If the paper is accepted, we will publish in a public repository.
Open Datasets Yes We consider 100 samples from the validation set of FFHQ (Karras et al., 2019) used in Chung et al. (2022a). We use the free masks (10% 20%) from Saharia et al. (2022). For this experiment, we consider Image Net (Russakovsky et al., 2015) to demonstrate that our method outperforms its baselines on other datasets. We consider 75 images from the COCO dataset and compute average qualities and diversities across all images. For our augmented formulation, we use Stable diffusion trained in the LAION (Schuhmann et al., 2022) dataset as its pre-trained model.
Dataset Splits Yes We consider 100 samples from the validation set of FFHQ (Karras et al., 2019) used in Chung et al. (2022a).
Hardware Specification Yes We ran the experiments on the same NVIDIA A100 GPU with 80GB. We run all the experiments in a single NVIDIA A100 GPU of 80GB. We use one Nvidia A100 of 80GB
Software Dependencies No The paper mentions 'Adam (Kingma, 2014)' as an optimizer and 'Stable Diffusion models' as pre-trained models. It also mentions 'Deep Floyd-IF-XL-v1.0' for text-to-3D generation. However, it does not provide specific version numbers for general software dependencies like programming languages (e.g., Python), deep learning frameworks (e.g., PyTorch, TensorFlow), or other libraries used for implementation.
Experiment Setup Yes Unless we state otherwise, we consider 1000 steps (the full denoising trajectory) for all the cases. We consider Adam (Kingma, 2014) in the optimization steps (lines 7 and 9 in Alg. 1) and set the momentum pair (0.9, 0.99). We randomly initialize variables x and z and generate a batch of N = 4 particles per noisy measurement. Regarding the pre-trained model, we consider Stable diffusion v2.1, although other latent diffusion models can be used. As diversity metric, we evaluate the pairwise diversity as the 1 cosine similarity between the N particles. Lastly, for the kernel function, we consider a RBF: k(zi, zj) = exp( ||g DINO(zi) g DINO(zj)||2 / ht ), where ht = m2 t/log N, mt is the median particle distance (Liu and Wang, 2016) and g DINO is a pre-trained neural network (Caron et al., 2021). For the hyperparameters, we set λ = 0.14, ρ = 0.075, lrx = 0.4 and lrz = 0.8 (for inpainting). We set λ = 0.2, ρ = 0.05, lrx = 0.4 and lrz = 0.6 (for super resolution). We set λ = 0.007, ρ = 0.01, lrx = 0.4 and lrz = 0.3, and L = 500 (for motion deblurring).