reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Equivariant Denoisers Cannot Copy Graphs: Align Your Graph Diffusion Models

Authors: Najwa Laabid, Severi Rissanen, Markus Heinonen, Arno Solin, Vikas Garg

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We assess the effect of aligned denoisers on the performance of regular discrete diffusion models. To this effect, we first demonstrate our model on a toy example: copying simple graphs. We then evaluate our method more rigorously on the real-world task of retrosynthesis, the task of defining precursors for a given compound. Our experimental results shows that a combination of our alignment methods achieves SOTA-matching results on retrosynthesis, the task of predicting precursor molecules for a given target.
Researcher Affiliation	Collaboration	Najwa Laabid1 , Severi Rissanen1 , Markus Heinonen1, Arno Solin1, Vikas Garg1,2 1Department of Computer Science, Aalto University 2Yai Yai Ltd EMAIL, EMAIL, EMAIL
Pseudocode	Yes	The training and sampling procedures with graph diffusion models are presented in Alg. 1 and Alg. 2, along with optional conditioning on PY X, as described in Sec. 3.3.
Open Source Code	Yes	Code is available at https://github.com/Aalto-QuML/DiffAlign.
Open Datasets	Yes	We use the benchmark dataset USPTO-50k for our experiments. The dataset consists of 50000 chemical reactions, in SMILES format (Weininger, 1988), curated by Schneider et al. (2016) from an original 2 million reactions extracted through text mining by Lowe (2012). More information on the benchmark dataset USPTO and its various subsets can be found in App. C.2.
Dataset Splits	Yes	We use the benchmark dataset USPTO-50k for our experiments. [...] More information on the benchmark dataset USPTO and its various subsets can be found in App. C.2. [...] Table A3: UPSTO-50K subsets used in retrosynthesis [...] 50k Schneider et al. (2016) 50 016 Dai et al. (2019) [...] We trained the models for 400 – 600 epochs and chose the best checkpoint based on the Mean Reciprocal Rank (MRR, (Liu et al., 2009)) score with T = 10 of the validation set.
Hardware Specification	Yes	These models were trained for approximately 600 epochs with a single A100/V100/AMD MI250x GPU. For the model where alignment is done by concatenating Y along the feature dimension in the input, the attention map sizes were smaller and we could fit a larger batch of 32 with a single V100 GPU. [...] Sampling 100 samples for one product with T = 100 from the model takes roughly 60 seconds with the current version of our code with an AMD MI250x GPU, and 100 samples with T = 10 takes correspondingly about 6 seconds.
Software Dependencies	No	For the chiral tags, we take the ground-truth SMILES for the product molecules from the dataset and assign the corresponding chiral tag to the corresponding atom mapping on the generated reactants. For cis/trans isomerism, we use the Chem.rdchem.Bond Dir bond field in rdkit molecules and transfer them to the reactant side based on the atom mapping of the pair of atoms at the start and end of the bond. (The text mentions RDKit but does not provide a version number, and no other specific software dependencies with version numbers are listed.)
Experiment Setup	Yes	Our denoiser is implemented as a Graph Transformer (Dwivedi & Bresson, 2021), based on the implementation of Vignac et al. (2023) with additional graph-level level features added to the input of the model. [...] In all of our models, we use 9 Graph Transformer layers. When using Laplacian positional encodings, we get the 20 eigenvectors of the Graph Laplacian matrix with the largest eigenvalues and assign to each node a 20-dimensional feature vector. We use a maximum of 15 blank nodes... we weigh the edge components in the cross-entropy loss by a factor of 5 compared to the node components. We used a batch size of 16 [...] a larger batch of 32 with a single V100 GPU. This model was trained for 600 epochs.