reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Boomerang: Local sampling on image manifolds using diffusion models

Authors: Lorenzo Luzi, Paul M Mayer, Josue Casco-Rodriguez, Ali Siahkoohi, Richard Baraniuk

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present three applications for local sampling using Boomerang. First, we provide a framework for constructing privacy-preserving datasets having controllable degrees of anonymity. Second, we show that using Boomerang for data augmentation increases generalization performance and outperforms state-of-the-art synthetic data augmentation. Lastly, we introduce a perceptual image enhancement framework powered by Boomerang, which enables resolution enhancement.
Researcher Affiliation	Academia	Lorenzo Luzi EMAIL Rice University Paul M. Mayer EMAIL Rice University Josue Casco-Rodriguez EMAIL Rice University Ali Siahkoohi EMAIL Rice University Richard G. Baraniuk EMAIL Rice University
Pseudocode	Yes	Algorithm 1 Boomerang local sampling, given a diffusion model fϕ(x, t) Input: x0, t Boom, {αt}T t=1, {βt}T t=1 Output: x 0 ϵ N(0, I) x t Boom αt Boomx0 + 1 αt Boomϵ for t = t Boom, ..., 1 do if t > 1 then βt = 1 αt 1 1 αt βt η N(0, βt I) else η = 0 end if x t 1 fϕ(x t, t) + η end for return x 0
Open Source Code	Yes	A Boomerang Colab demo is available at https://colab.research.google.com/drive/1PV5Z6b14HYZNx1l HCa EVh Id-Y4ba KXwt.
Open Datasets	Yes	To show the versatility of Boomerang anonymization, we apply it to several datasets such as the LFWPeople (Huang et al., 2007), Celeb A-HQ (Karras et al., 2018), CIFAR-10, CIFAR-100 (Krizhevsky et al., 2009), FFHQ (Karras et al., 2019), and ILSVRC2012 (Image Net) (Russakovsky et al., 2015) datasets.
Dataset Splits	No	The paper mentions using well-known datasets such as CIFAR-10, CIFAR-100, Image Net-200, and Image Net for classification tasks. It presents results like 'Top-1 Test Accuracy' and 'Top-5 Test Accuracy' (Table 1, Table 2), implying the use of test sets. However, it does not explicitly state the specific training/validation/test splits used for these datasets in its own experimental setup, nor does it cite a reference for the specific splits used in this work. It implicitly assumes standard splits but does not provide the explicit details required.
Hardware Specification	Yes	These times are reported for a single Nvidia Ge Force GTX Titan X GPU
Software Dependencies	No	The paper mentions various models and frameworks like Stable Diffusion, Patched Diffusion, DLSM, Fast DPM, Style GAN-XL, Res Net-18, VGG-Face, Facenet, and Alex Net. However, it does not provide specific version numbers for any of these software components, programming languages, or libraries, which is necessary for reproducibility.
Experiment Setup	Yes	When generating Boomerang samples for data anonymization or augmentation, we pick t Boom so that the Boomerang samples look visually different than the original samples. With the Fast DPM model we use t Boom/T = 40/100 = 40%3; with Patched Diffusion, we use t Boom/T = 75/250 = 30%; and with DLSM, we use t Boom/T = 250/1000 = 25%. We then randomly choose to use the training data or the Boomerang-generated data with probability 0.5 at each epoch. We use Res Net-18 (He et al., 2016) for our experiments. Empirical tests showed that setting t Boom 100 on the Patched Diffusion Model (out of T = 250) produced a good balance between sharpness and the features of the ground-truth image, as seen in Appendix A.2.