reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

DenoiseVAE: Learning Molecule-Adaptive Noise Distributions for Denoising-based 3D Molecular Pre-training

Authors: Yurou Liu, Jiahao Chen, Rui Jiao, Jiangmeng Li, Wenbing Huang, Bing Su

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments show that Denoise VAE outperforms the current state-of-the-art methods on various molecular property prediction tasks, demonstrating the effectiveness of it. 5 EXPERIMENTS 5.1 SETTINGS 5.2 MAIN RESULTS 5.3 ABLATION STUDIES
Researcher Affiliation	Academia	1 Gaoling School of Artificial Intelligence, Renmin University of China, Beijing, China 2 Department of Computer Science and Technology, Tsinghua University, Beijing, China 3 Institute for AI Industry Research, Tsinghua University, Beijing, China 4 Institute of Software, Chinese Academy of Sciences, Beijing, China EMAIL, EMAIL, EMAIL
Pseudocode	Yes	We provide the pseudocode in the Appendix A.6. Algorithm 1 Algorithm of our Denoise VAE
Open Source Code	No	The paper does not provide an explicit statement about releasing source code for the Denoise VAE methodology or a link to a code repository.
Open Datasets	Yes	We leverage a large-scale molecular dataset PCQM4Mv2 (Nakata & Shimazaki, 2017) as our pre-training dataset. For downstream tasks, we evaluate our method both on molecular and complex property prediction. For the former, we test on QM9 (Ruddigkeit et al., 2012; Ramakrishnan et al., 2014), MD17 (Chmiela et al., 2017) and PCQM4Mv2 (Nakata & Shimazaki, 2017). For the latter, we adopt the widely recognized PDBBind dataset (v2019) for the ligand binding affinity (LBA) prediction, adhering to the 30% and 60% protein sequence identity splits and preprocessing methods outlined in Atom3D (Townshend et al., 2020).
Dataset Splits	Yes	For details, QM9 contains 12 chemical properties of small molecules with stable 3D structures. We follow previous work (Jiao et al., 2023) and split the dataset as the training set, validation set, and test set, which contains 100k, 18k, and 13k conformations, respectively. MD17 contains the simulated dynamical trajectories of 8 small organic molecules, with the recorded energy and force at each frame. We select 9,500 and 500 frames as the training and validation set respectively. PCQM4Mv2 (Nakata & Shimazaki, 2017) has a divided validation set and test set. We report the performance on the validation set according to formal standards, please refer to Appendix A.7 for details. For the latter, we adopt the widely recognized PDBBind dataset (v2019) for the ligand binding affinity (LBA) prediction, adhering to the 30% and 60% protein sequence identity splits and preprocessing methods outlined in Atom3D (Townshend et al., 2020).
Hardware Specification	Yes	For training resources, all experiments are conducted on Intel(R) Xeon(R) Gold 5318Y CPU @ 2.10GHz with a single RTX A3090 GPU.
Software Dependencies	No	The paper mentions the RDKit library (Landrum, 2006) for energy calculation but does not specify a version number for it or any other software dependencies.
Experiment Setup	Yes	Experimental setup We set the prior distribution pxi as a Gaussian distribution, where xi is the mean and σ is the standard deviation. If not specifically noted, we set σ = 0.1 for all experiments. Table 13: Hyper-parameters for Pre-training dataset. Dataset PCQM4Mv2 Batch size 128 Optimizer Adam W Max learning rate 0.0005 Learning rate decay policy Cosine Network archecture Equivariant Graph Neural (EGNN) Noise Generator layers 4 Denoising Module layers 7