reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Geometric Representation Condition Improves Equivariant Molecule Generation

Authors: Zian Li, Cai Zhou, Xiyuan Wang, Xingang Peng, Muhan Zhang

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimentally, our method achieves the following significant improvements: Substantially enhancing the quality (e.g., molecule stability) of the generated molecules on the widely used QM9 and GEOM-DRUG datasets.
Researcher Affiliation	Academia	1Institute for Artificial Intelligence, Peking University, Beijing, China 2School of Intelligence Science and Technology, Peking University, Beijing, China 3Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, USA 4Department of Automation, Tsinghua University, Beijing, China.
Pseudocode	Yes	We provide the high-level training and sampling algorithm for Geo RCG in Algorithm 1.
Open Source Code	Yes	Code is available at https://github.com/Graph PKU/Geo RCG.
Open Datasets	Yes	As a method for 3D molecule generation, we evaluate Geo RCG on the widely used datasets QM9 (Ramakrishnan et al., 2014) and GEOMDRUG (Gebauer et al., 2019; 2022; Axelrod & Gomez Bombarelli, 2022).
Dataset Splits	Yes	To ensure fair comparisons, we follow the dataset split and configurations exactly as in Anderson et al. (2019); Hoogeboom et al. (2022); Xu et al. (2023).
Hardware Specification	Yes	Training on QM9 takes approximately 2.5 days on a single Nvidia 4090, while training on GEOM-DRUG takes around 4 days on a single Nvidia A800. ... Training takes approximately 6 days on QM9 using a single Nvidia 4090, and around 10 days on GEOM-DRUG using two Nvidia A800 GPUs.
Software Dependencies	No	The paper mentions "RDKit" and "Open Babel" for bond determination or energy calculation, but does not specify their version numbers or other software dependencies with version numbers.
Experiment Setup	Yes	We use 18 blocks of residual MLP layers with 1536 hidden dimensions, 1000 diffusion steps, and a linear noise schedule for βt. The representation generator is trained for 2000 epochs with a batch size of 128 for both the QM9 and GEOM-DRUG datasets. ... For the EGNN hyperparameters, we use 9 layers with 256 hidden dimensions for QM9 and 4 layers with 256 hidden dimensions for GEOM-DRUG. The number of diffusion steps is set to 1000 (except for cases in Table 4 that generate molecules with fewer steps), and we employ the polynomial scheduler for α(M) t . ... During training, we use a batch size of 128 and 3000 epochs on QM9, and a batch size of 64 and 20 epochs on GEOM-DRUG.