reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Simple Guidance Mechanisms for Discrete Diffusion Models

Authors: Yair Schiff, Subham Sahoo, Hao Phung, Guanghan Wang, Sam Boshar, Hugo Dalla-torre, Bernardo Almeida, Alexander Rush, Thomas Pierrot, Volodymyr Kuleshov

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, we demonstrate that our guidance mechanisms combined with uniform noise diffusion improve controllable generation relative to autoregressive and diffusion baselines on several discrete data domains, including genomic sequences, small molecule design, and discretized image generation. Code to reproduce our experiments is available here. 5 EXPERIMENTS
Researcher Affiliation	Collaboration	Department of Computer Science, Cornell University EMAIL Hugo Dalla-Torre, Sam Boshar, Bernardo P. de Almeida, & Thomas Pierrot Insta Deep EMAIL
Pseudocode	No	The paper describes methods mathematically and textually but does not contain a clearly labeled 'Pseudocode' or 'Algorithm' section, nor does it present structured algorithmic steps in a dedicated block. Algorithm 2 from Gruver et al. (2024) is referenced, but this is an external algorithm, not one presented within this paper.
Open Source Code	Yes	Code to reproduce our experiments is available here. (Referring to the footnote link to GitHub: https://github.com/YairSchiff/discrete_guidance)
Open Datasets	Yes	Datasets For our language modeling experiments, we examine several discrete domains: reference genomes from tens species (Species10), the QM9 small molecule dataset (Ruddigkeit et al., 2012; Ramakrishnan et al., 2014), where molecules are represented by SMILES strings (Weininger, 1988), CIFAR10 discretized images (Krizhevsky et al., 2009), and three NLP datasets consisting of text8 (Mahoney, 2011), Amazon Review (Mc Auley & Leskovec, 2013; Zhang et al., 2015), and the one billion words dataset (LM1B; Chelba et al. (2014)). This dataset is made available here: https://huggingface.co/datasets/yairschiff/ten_species. This dataset is made available here: https://huggingface.co/datasets/yairschiff/qm9. The dataset was downloaded from http://mattmahoney.net/dc/text8.zip. The Amazon Review dataset was downloaded from https: //huggingface.co/datasets/fancyzhx/amazon_polarity. This dataset was downloaded from https://huggingface.co/datasets/billion-word-benchmark/lm1b.
Dataset Splits	Yes	Training and validation sets were randomly split using 95% / 5%. (for Species10 and QM9), We use the provided train and test splits of 50,000 training images and 10,000 validation images. (for CIFAR10), The first 90M characters were used for the training set and the final 5M characters were used as a validation set. (for text8), Train and validation splits were used from the downloaded data, with 3.6 M sequences in the training data and 400k sequences in the validation set. (for Amazon Review), After chunking the data, our training set consisted of 7M sequences and our validation set consisted of 72k sequences. (for LM1B).
Hardware Specification	Yes	In Table 13: GPU type A5000, GPU type A6000, GPU type A100. For example, Species10 used 8 A5000 GPUs, QM9 used 4 A6000 GPUs, and CIFAR10 used 8 A100 GPUs.
Software Dependencies	No	The paper mentions software like the RDKit library (Landrum et al., 2013) and GPT-2 Large (Radford et al., 2019), but does not provide specific version numbers for these or any other key software components used in the experiments (e.g., Python, PyTorch, TensorFlow, etc.).
Experiment Setup	Yes	In Table 13, we detail the hyperparameter setup for each of the language modeling experiments in Section 5.1. This includes specific values for 'Train steps', 'Context size', 'Batch size', 'LR', 'Optim. (0.9, 0.999)', 'LR sched. Cosine decay', and 'LR warmup steps' for various datasets such as Species10, QM9, CIFAR10, text8, Amazon, and LM1B.