Geometric Representation Condition Improves Equivariant Molecule Generation
Authors: Zian Li, Cai Zhou, Xiyuan Wang, Xingang Peng, Muhan Zhang
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimentally, our method achieves the following significant improvements: Substantially enhancing the quality (e.g., molecule stability) of the generated molecules on the widely used QM9 and GEOM-DRUG datasets. |
| Researcher Affiliation | Academia | 1Institute for Artificial Intelligence, Peking University, Beijing, China 2School of Intelligence Science and Technology, Peking University, Beijing, China 3Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, USA 4Department of Automation, Tsinghua University, Beijing, China. |
| Pseudocode | Yes | We provide the high-level training and sampling algorithm for Geo RCG in Algorithm 1. |
| Open Source Code | Yes | Code is available at https://github.com/Graph PKU/Geo RCG. |
| Open Datasets | Yes | As a method for 3D molecule generation, we evaluate Geo RCG on the widely used datasets QM9 (Ramakrishnan et al., 2014) and GEOMDRUG (Gebauer et al., 2019; 2022; Axelrod & Gomez Bombarelli, 2022). |
| Dataset Splits | Yes | To ensure fair comparisons, we follow the dataset split and configurations exactly as in Anderson et al. (2019); Hoogeboom et al. (2022); Xu et al. (2023). |
| Hardware Specification | Yes | Training on QM9 takes approximately 2.5 days on a single Nvidia 4090, while training on GEOM-DRUG takes around 4 days on a single Nvidia A800. ... Training takes approximately 6 days on QM9 using a single Nvidia 4090, and around 10 days on GEOM-DRUG using two Nvidia A800 GPUs. |
| Software Dependencies | No | The paper mentions "RDKit" and "Open Babel" for bond determination or energy calculation, but does not specify their version numbers or other software dependencies with version numbers. |
| Experiment Setup | Yes | We use 18 blocks of residual MLP layers with 1536 hidden dimensions, 1000 diffusion steps, and a linear noise schedule for βt. The representation generator is trained for 2000 epochs with a batch size of 128 for both the QM9 and GEOM-DRUG datasets. ... For the EGNN hyperparameters, we use 9 layers with 256 hidden dimensions for QM9 and 4 layers with 256 hidden dimensions for GEOM-DRUG. The number of diffusion steps is set to 1000 (except for cases in Table 4 that generate molecules with fewer steps), and we employ the polynomial scheduler for α(M) t . ... During training, we use a batch size of 128 and 3000 epochs on QM9, and a batch size of 64 and 20 epochs on GEOM-DRUG. |