Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]

3D Molecular Generation via Virtual Dynamics

Authors: Shuqi Lu, Lin Yao, Xi Chen, Hang Zheng, Di He, Guolin Ke

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Comprehensive experimental results on pocket-based molecular generation demonstrate that VD-Gen can generate novel 3D molecules that fill the target pocket cavity with high binding affinities, significantly outperforming previous baselines.
Researcher Affiliation Collaboration Shuqi Lu EMAIL DP Technology Lin Yao EMAIL DP Technology Xi Chen EMAIL DP Technology Hang Zheng EMAIL DP Technology Di He EMAIL Peking University Guolin Ke EMAIL DP Technology
Pseudocode Yes Algorithm 1 Iterative Movement Algorithm 2 Backbone_Update Algorithm 3 VD-Gen Inference Pipeline Algorithm 4 Molecule Extraction Algorithm
Open Source Code No The text mentions a specific GitHub link for the 3D U-net model, stating it is implemented 'based on' this. However, it does not provide an explicit statement or link for the entire VD-Gen methodology's source code, only a component used within it. Therefore, there is no concrete access to the source code for the methodology described in this paper.
Open Datasets Yes Data we use the same training dataset as used in previous works (Luo et al., 2022; Peng et al., 2022), specifically the Cross Docked data (Francoeur et al., 2020), to train VD-Gen. ... For the test set, we utilize 100 protein-ligand complex crystal structures from (Yang et al., 2022)
Dataset Splits Yes Luo et al. (2022) filter out data points whose binding pose RMSD is greater than 1Å, and use MMSeqs2 (Steinegger & Söding, 2017) to cluster the data, and randomly select 100,000 protein-ligand pairs for training. For the test set, we utilize 100 protein-ligand complex crystal structures from (Yang et al., 2022)... To prevent data leakage, we exclude complexes from the training data whose protein sequences bear similarity to those in the test set.
Hardware Specification Yes Initially, the 3D U-Net model used for the pocket cavity detection is trained independently, requiring around 20 hours on 8 NVIDIA A100 GPUs. Subsequently, the parameters of the 3D U-Net model are frozen, and the entire VD-Gen pipeline is trained end-to-end, taking approximately 15 hours on 8 NVIDIA A100 GPUs.
Software Dependencies Yes We utilize Auto Dock Vina1.2 (Eberhardt et al., 2021) to obtain Vina scores.
Experiment Setup Yes The detailed configurations of VD-Gen are listed in Table 3 and Table 4 1. ... Table 4: Settings for SE(3) models in VD-Gen. Name Value Training Particle encoder layers 12 Pocket encoder layers 15 Particle-Pocket Attention layers 3 Peak learning rate 5e-5 Batch size 32 Max training steps 100k Warmup steps 10K Attention heads 64 FFN dropout 0.1 Attention dropout 0.1 Embedding dropout 0.1 Weight decay 1e-4 Embedding dim 512 FFN hidden dim 2048 Activation function GELU Learning rate decay Linear Adams ϵ 1e-6 Adams(β1, β2) (0.9,0.99) Gradient clip norm 1.0 Loss weight of Lan in Particle Initialization 1.0 Loss weight of Particle Movement 1.0 Loss weight of LMerge in Molecule Extraction 10 Loss weight of Lerror_pred in Molecule Extraction 0.01 Loss weight of Molecule Refinement 1.0 Loss weight for Confidence Prediction 0.01 τ, the clip value for coordinate loss 2.0 δ, the threshold for coordinate regularization 1.0 ζ, the filtering threshold in Molecule Extraction 2.0 R1, Iterations in Particle Movement sampled from [1, 4] R2, Iterations in Molecule Refinement sampled from [1, 4] kvp times of the number of atom, uses in Particle Initialization sampled from [16.0, 18.0] Inference R1, Iterations in Particle Movement 4 R2, Iterations in Molecule Refinement 16