reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Self-Supervised Diffusion Models for Electron-Aware Molecular Representation Learning

Authors: Gyoung S. Na, Chanyoung Park

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In our experiments, we focus on evaluating the prediction capabilities of the machine learning methods on biased and relatively small experimental datasets rather than simulated datasets (e.g., QM9 dataset (Ramakrishnan et al., 2014)). Although the simulated datasets are useful for analyzing rough statistics on small molecules, they are not appropriate to evaluate the prediction capabilities of the machine learning methods on real-world molecular physics due to the following two reasons: 1) The simulated datasets do not contain complex and large molecules due to the large time complexity of the quantum mechanical calculations. 2) The simulated datasets do not sufficiently reflect the quantum mechanical uncertainty in real-world molecules (Sim et al., 2018). For these reasons, we used experimentally collected molecular datasets from physicochemistry, toxicity, pharmacokinetics, and optical applications to evaluate the practical potential of DELID. For all benchmark molecular datasets, DELID achieved state-of-the-art performance in predicting experimentally observed properties of real-world complex molecules.
Researcher Affiliation	Academia	Gyoung S. Na KRICT, Republic of Korea EMAIL Chanyoung Park KAIST, Republic of Korea EMAIL
Pseudocode	Yes	Algorithm 1 shows an algorithmic description of the forward and training processes of DELID.
Open Source Code	Yes	The source code of DELID is publicly available at https://github.com/ngs00/DELID.
Open Datasets	Yes	We employed nine benchmark molecular datasets constructed by real-world chemical experiments. The benchmark molecular datasets were selected from well-known databases in molecular science (Wu et al., 2018; Wu & Wei, 2018; Mendez et al., 2019; Joung et al., 2020).
Dataset Splits	Yes	For all datasets, the R2-scores were measured by the 5-fold cross-validation.
Hardware Specification	Yes	The execution time was measured in a machine with Intel i9-12900K CPU, 128G memory, and NVIDIA Ge Force RTX 3090 Ti GPU.
Software Dependencies	Yes	DELID and experiment scripts were implemented with Py Torch 2.0.0+cu1172 and Py Torch Geometric 2.3.13 under Python 3.9.
Experiment Setup	Yes	The model parameters of DELID were optimized by the Adam W optimizer (Loshchilov & Hutter, 2017) for all experiments in this paper. The initial learning rate and L2 regularization coefficients were fixed to 5e-4 and 5e-6 for all benchmark datasets, respectively. Batch size is also fixed to 64 for all benchmark datasets. The GNN-based embedding networks were constructed by two node aggregation layers and one dense layer with 64 output channels.