reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Any-Property-Conditional Molecule Generation with Self-Criticism using Spanning Trees

Authors: Alexia Jolicoeur-Martineau, Aristide Baratin, Kisoo Kwon, Boris Knyazev, Yan Zhang

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We run four sets of experiments. First, we show that our model can generate molecules conditioned on properties from the test set with high fidelity. Second, we show that our model can efficiently generate (high % of novel, unique, and valid) molecules with high fidelity on out-of-distribution (OOD) properties. Third, we show that our model can produce molecules that maximize a reward function, achieving similar or better performance compared to online learning methods using offline learning. Finally, we show that our model can generate high fidelity molecules conditioned on out-of-distribution (OOD) properties on a small dataset of larger and more complex molecules. We provide ablations for the various STGG+ components on OOD properties for Zinc (Table 5).
Researcher Affiliation	Industry	Alexia Jolicoeur-Martineau EMAIL Samsung SAIL Montréal Aristide Baratin EMAIL Samsung SAIL Montréal Kisoo Kwon EMAIL Artificial Intelligence Center, Device Solutions, Samsung Electronics Boris Knyazev EMAIL Samsung SAIL Montréal Yan Zhang EMAIL Samsung SAIL Montréal
Pseudocode	Yes	A.11 Algorithms Algorithm 1 STGG+ Training Require: Dataset D = {(xi, yi)} where xi is a molecule and yi RD are its properties Require: Transformer model fθ 1: while not converged do 2: Sample batch (x1, . . . , x B) and properties (y1, . . . , y B) 3: for each molecule xi in batch do 4: Tokenize xi into sequence (t1, . . . , t L) 5: Mask a random subset of m properties from yi = (yi1, ..., yi D), where m Uniform(0, D) 6: Compute (h1, . . . , h L), where hj fθ(tj\|yi, t1, ..., tj 1) 7: Compute the cross-entropy loss LCE 8: Compute the auxiliary property prediction loss Lprop = 1 2 f pred θ (h L) yi 2 2 9: Update θ using gradient descent on L = LCE + λLprop Algorithm 2 STGG+ Sampling with Self-Criticism Require: Target properties ytarget RT Require: Guidance strength w, number of candidates K, max length Lmax Require: Transformer model fθ 1: Generate K candidate molecules: 2: for k = 1 to K do 3: Initialize the sequence; t1 [BOS] 4: Sample guidance scale w U(0.5, 2) (optional; otherwise w=1 means no guidance) 5: for j = 2 to Lmax do 6: Compute conditional logits zc = fθ(tj\|ytarget, t1, ..., tj 1) 7: Compute unconditional logits zu = fθ(tj\| , t1, ..., tj 1) 8: Apply Classifier-Free Guidance (CFG): z = wzc + (1 w)zu 9: Mask invalid tokens (valency violations, syntax errors, ring overflow, etc.) 10: Sample the next token tj Softmax(z) 11: if tj [EOS] then 12: break 13: Store the molecule sk 14: Self-criticism: 15: for each candidate sk do 16: Process sk through fθ with empty properties to predict ˆyk 17: Compute distance dk = ˆyk ytarget 2 2 18: return sj where j arg mink dk {Select the best candidate}
Open Source Code	No	The paper does not provide concrete access to source code for the methodology described. It mentions using third-party software like Py Torch, Molecular Sets (MOSES), and RDKit, but not that its own implementation is open-source or available.
Open Datasets	Yes	We experiment with six datasets: (1) QM9 (Ramakrishnan et al., 2014) with around 134k molecules and maximum SMILES length of 37; (2) Zinc250K (Sterling & Irwin, 2015) with 250k molecules and maximum length of 136; (3) BBBP (Wu et al., 2018) with 862 molecules and maximum length of 186; (4) BACE (Wu et al., 2018) with 1332 molecules and maximum length of 161; (5) HIV (Wu et al., 2018) with 2372 molecules and maximum length of 193; (6) Chromophore DB (Joung et al., 2020) with 6810 molecules and maximum length of 511.
Dataset Splits	Yes	We follow the same protocol as Liu et al. (2024). We train our model on HIV, BACE, and BBBP. We use the same train, valid, and test splits as Liu et al. (2024). [...] For QM9 (Ramakrishnan et al., 2014), [...] The dataset has 133886 molecules with around 10% of the molecules in the test set and 5% in the validation set. For Zinc250K (Sterling & Irwin, 2015), [...] The dataset has 250k molecules with around 10% of the molecules in the test set and 5% in the validation set. [...] For Chromophore DB (Joung et al., 2020) [...] The dataset has 6810 molecules with around 5% of the molecules in the test set and 5% in the validation set.
Hardware Specification	Yes	We generally use 1 to 4 A-100 GPUs to train the models.
Software Dependencies	No	We rely on the following software: Py Torch (Paszke et al., 2019), Molecular Sets (MOSES) (Polykovskiy et al., 2020) and RDKit (Landrum et al., 2024).
Experiment Setup	Yes	For QM9 (Ramakrishnan et al., 2014), we train for 50 epochs with batch size 512, learning rate 1e-3, max length 150. For Zinc250K (Sterling & Irwin, 2015), we train for 50 epochs with batch size 512, learning rate 1e-3, max length 250. For HIV, BACE, and BBBP (Wu et al., 2018), we train for 10K epochs (same as done by Liu et al. (2024)), since these are small datasets, with batch size 128, learning rate 2.5e-4, max length 300. For Chromophore DB (Joung et al., 2020), we train for 1000 epochs with batch size 128, learning rate 2.5e-4, max length 600. For the pre-training on Zinc250K and fine-tuning on Chromophore-DB: we pre-train with batch size 512, learning rate 1e-3, and max length 600 for 50 epochs and fine-tune with batch size 128, learning rate 2.5e-4, and max length 600 for 100 epochs.