reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Learning Condensed Graph via Differentiable Atom Mapping for Reaction Yield Prediction

Authors: Ankit Ghosh, Gargee Kashyap, Sarthak Mittal, Nupur Jain, Raghavan B Sunoj, Abir De

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments show that YIELDNET can predict the yield more accurately than the baselines. Furthermore, the model is trained only under the distant supervision of yield values, without requiring fine-grained supervision of atom mapping. (...) Our experimental evaluation across multiple datasets show that YIELDNET is able to outperform several baselines by a significant margin.
Researcher Affiliation	Academia	1Department of Chemistry, IIT Bombay 2Department of Computer Science and Engineering, IIT Bombay. Correspondence to: Ankit Ghosh <EMAIL>.
Pseudocode	No	The paper describes the model architecture and mathematical formulations for GNNθ (Section C.1) and Input Differentiable GNNψ (Section C.2) using equations (22)-(35) and (36)-(45) respectively, but these are descriptions of computational steps rather than clearly structured pseudocode or algorithm blocks.
Open Source Code	Yes	Our code is available in https://github.com/ankitthreo/YieldNet.git.
Open Datasets	Yes	We carry out our experiments using eight datasets. They include (1) GP dataset which is derived from Gasphase Isomerization reactions (Grambow et al., 2020b); five datasets derived from catalytic asymmetric N, S-acetal formation reaction (Zahrt et al., 2019), viz., (2) NS1, (3) NS2, (4) NS3, (5) NS4, (6) NS5; one dataset on (7) Suzuki coupling reaction (SC); and, another dataset on (8) Deoxyflurorination (Nielsen et al., 2018) (DF). (...) The Gas-Phase reaction datasets (Grambow et al., 2020b) are licensed under CC-BY 4.0. The USPTO dataset used here comes under the MIT License. The original USPTO dataset (Schwaller et al., 2021b) by Lowe (2017) comes under CC0 1.0 License. NS (Zahrt et al., 2019) and DF (Nielsen et al., 2018) datasets used here, are available in (Singh & Sunoj, 2022).
Dataset Splits	Yes	We partitioned the datasets into 70% training, 10% validation, and 20% test folds. We generated ten random splits using different random seeds.
Hardware Specification	Yes	All the models are trained on NVIDIA A100 80GB GPU. All the models are fully based on PyTorch (Paszke et al., 2019). We run in Ubuntu 20.04.6 LTS machine having 2TB RAM with 64 bit CPU and AMD EPYC 7742 64-Core Processor.
Software Dependencies	No	The paper mentions software like PyTorch (Paszke et al., 2019) and Network X (Hagberg et al., 2008), and operating system Ubuntu 20.04.6 LTS, but does not provide specific version numbers for the key software libraries and dependencies used to implement the methodology, other than the OS version.
Experiment Setup	Yes	We kept the batch size b same across all models. We set b = 50 for GP datasets and b = 8 for the rest. We train each model with 100 epochs and evaluate their performance on the test dataset, selecting the epoch with the lowest validation MAE. We use Adam optimizer for each model. We use Noam learning rate with 2 warmup epochs and an initial and final learning rate of 10^-4 and a maximum learning rate of 10^-3. Additionally, we keep the regularizer parameter ρ fixed at a value of 0.1 in our model. (...) 1. GNNθ. We set the dimension of node embeddings, d = 20 (30) and dimension of edge embeddings, D = 20 (31). 2. Align. We set temperature λ = 0.1 in Eq. (6), Gumbel noise factor 1.0, and Sinkhorn iterations T = 10. 3. Input Differentiable GNNψ. It shares parameters with GNNθ except ψ0 and ψ4 (Appendix C). We set dH = 20 n, where n is the number of steps. 4. Transformerϕ1. As dH = 20 n = (dH +n) = 20, which is the input and output dimension of the Transformerϕ1. We set the number of heads to 5, and the feedforward dimension to 2048.