Direct Molecular Conformation Generation

Authors: Jinhua Zhu, Yingce Xia, Chang Liu, Lijun Wu, Shufang Xie, Yusong Wang, Tong Wang, Tao Qin, Wengang Zhou, Houqiang Li, Haiguang Liu, Tie-Yan Liu

TMLR 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our method achieves the best results on GEOM-QM9 and GEOM-Drugs datasets. Further analysis shows that our generated conformations have closer properties (e.g., HOMO-LUMO gap) with the groundtruth conformations. In addition, our method improves molecular docking by providing better initial conformations. All the results demonstrate the effectiveness of our method and the great potential of the direct approach. The code is released at https://github.com/Direct Molecular Conf Gen/DMCG. Section 4 is titled 'Experiments' and details dataset usage, evaluation metrics, and comparative results.
Researcher Affiliation Collaboration 1 University of Science and Technology of China 2 Microsoft Research AI4Science 3 Renmin University of China 4 Xi an Jiaotong University. The affiliations include both universities (academic) and Microsoft Research AI4Science (industry).
Pseudocode No The paper describes the model architecture and operations mathematically (e.g., equations 3-7) and provides a diagram (Figure 3: Network architecture of the l-th block), but it does not contain a clearly labeled 'Pseudocode' or 'Algorithm' block with structured steps.
Open Source Code Yes The code is released at https://github.com/Direct Molecular Conf Gen/DMCG.
Open Datasets Yes Following prior works (Xu et al., 2021a; Shi et al., 2021), we use the GEOM-QM9 and GEOMDrugs datasets (Axelrod & Gomez-Bombarelli, 2021) for conformation generation.
Dataset Splits Yes For the small-scale setting, we use the same datasets provided by Shi et al. (2021) for fair comparison with prior works. The training, validation and test sets of the two datasets consist of 200K, 2.5K and 22408 (for GEOM-QM9)/14324 (for GEOM-Drugs) moleculeconformation pairs respectively. After that, we work on the large-scale setting by sampling larger datasets from the original GEOM to validate the scalability of our method. We use all data in GEOM-QM9 and 2.2M molecule-conformation pairs for GEOM-Drugs. The numbers of training, validation and test sets for the larger GEOM-QM9 setting are 1.37M, 165K and 174K, and those for larger GEOM-Drugs are 2M, 100K and 100K.
Hardware Specification Yes For the two small-scale settings, the experiments are conducted on a single V100 GPU. For the two large-scale settings, we use two V100 GPUs for experiments.
Software Dependencies No The paper mentions 'PyTorch profiler', 'torch.linalg.eig', 'graph_tool toolkit', and 'Adam W optimizer'. While these indicate software used, specific version numbers for any of these components or other libraries (like Python, PyTorch, etc.) are not provided.
Experiment Setup Yes We use Adam W optimizer (Loshchilov & Hutter, 2019) with initial learning rate η0 = 2 × 10^−4 and weight decay 0.01. In the first 4000 iterations, the learning rate is linearly increased from 10^−6 to 2 × 10^−4. After that, we use cosine learning rate scheduler... Similarly, we also use the cosine scheduler to dynamically set the β at range [0.0001, βmax]. The batch size is fixed as 128. All models are trained for 100 epochs. The detailed hyper-parameters are described in Table 5.