reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Unlocking Guidance for Discrete State-Space Diffusion and Flow Models

Authors: Hunter Nisonoff, Junhao Xiong, Stephan Allenspach, Jennifer Listgarten

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically demonstrate the effectiveness of Discrete Guidance by applying it to a broad set of discrete state-space conditional generation tasks, including small-molecules, DNA sequences, and protein sequences. 6 EMPIRICAL INVESTIGATIONS We deployed Discrete Guidance on three conditional generation tasks spanning problems on small molecules, DNA sequences and protein sequences.
Researcher Affiliation	Academia	Hunter Nisonoff , Junhao Xiong , Stephan Allenspach , Jennifer Listgarten University of California, Berkeley EMAIL
Pseudocode	Yes	D IMPLEMENTATION DETAILS In this section we provide implementation details of Discrete Guidance, both as algorithmic summary and Py Torch code. First, we describe the procedure of training a discrete state-space flow models (DFM) under a masking process as detailed in Campbell et al. (2024) (Algorithm 1). Then, we describe the procedure for obtaining the guide adjusted rates (Algorithm 2), both with exact calculations (Listing 1) and Taylor-approximated guidance (TAG) (Listing 2). Finally, we also provide the minimal implementations for how Discrete Guidance can be integrated with the sampling of a DFM with masking process (Algorithm 3; predictor guidance in Listing 3; predictor-free guidance in Listing 4).
Open Source Code	Yes	REPRODUCIBILITY STATEMENT We have included detailed derivations of the results we presented in Appendix C. Appendix D provides minimal Py Torch implementations of Discrete Guidance and source code is available at https://github.com/hnisonoff/discrete guidance.
Open Datasets	Yes	Our dataset, consisting of 610,575 unique molecules... is based on QMugs (Isert et al., 2021) and has been constructed as described in Appendix F.2.1. We used the same enhancer sequence dataset as St ark et al. (2024), comprising 104k DNA sequences... (Janssens et al., 2022; Taskiran et al., 2024). We trained our model using the data provided by Tsuboyama et al. (2023), specifically the file (Tsuboyama2023 Dataset2 Dataset3 20230416.csv). Following Campbell et al. (2022), we modeled a CIFAR-10 image dataset as discrete pixels.
Dataset Splits	Yes	After these filtering steps, 610,575 unique molecules remained in the dataset that was then randomly-split into a train- and a holdout-set with a ratio of 4:1. We created a validation set by clustering all of the data based on the wild-type using the WT cluster column. We trained the noisy classifier for 200 epochs on the same training set as the denoising model... We used accuracy evaluated on the validation set for early stopping.
Hardware Specification	Yes	We trained on one RTX 6000A GPU.
Software Dependencies	No	The paper mentions software like RDKit, PyTorch (implicitly from "Py Torch implementations"), and the scipy python package, but does not provide specific version numbers for these software dependencies.
Experiment Setup	Yes	For the unconditional denoising model, we trained for 400 epochs and used FBD on the validation set for early stopping, with a learning rate of 0.0005 and 500 linear warmup steps. For the conditional denoising network used in PFG, we trained with a conditioning ratio (the fraction of times the model is trained with a class label as input instead of the no-class token as input) of 0.7 for 300 epochs and used validation loss for early stopping. For Di Gress, we trained a discrete-time, discrete diffusion model using the same training hyperparameters as the flow-matching models used in Discrete Guidance, using 100 discrete time steps.