Deep Signature: Characterization of Large-Scale Molecular Dynamics

Authors: Tiexin Qin, Mengxu ZHU, Chunyang Li, Terry Lyons, Hong Yan, Haoliang Li

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Furthermore, experimental results on three benchmarks of biological processes verify that our approach can achieve superior performance compared to baseline methods.
Researcher Affiliation Collaboration City University of Hong Kong1 & Chengdu Institute of Biological Products co. Ltd2 & University of Oxford3
Pseudocode No The paper describes the methodology using textual descriptions and figures (Fig. 1, Fig. 2, Fig. 3) but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code No The paper refers to a third-party Python package, Signatory, used for computations: "implemented based on Signatory1, a Python package that facilitates differentiable computations of the signature and log-signature transforms on both CPU and GPU. 1https://github.com/patrick-kidger/signatory". However, it does not provide an explicit statement or link for the source code of the Deep Signature methodology described in this paper.
Open Datasets Yes We begin with a synthetic dataset that reports gene regulatory dynamics (Gao et al., 2016). We then assess the performance on two large-scale MD simulation datasets, including epidermal growth factor receptor (EGFR) mutant dynamics (Zhu et al., 2021) and G protein-coupled receptors (GPCR) dynamics (Rodr ıguez-Espigares et al., 2020). More details on dataset construction can be found in Appendix B. We download our data from the GPCRmd (http://gpcrmd.org/) (Rodr ıguez Espigares et al., 2020) database
Dataset Splits Yes We employ a five-fold cross-validation strategy for training our model. In contrast to conventional random division, we take the temporal nature of the trajectory data into consideration for data partition. Specifically, each trajectory is divided into 5 groups with the same time interval according to its temporal order. Subsequently, we further partition the data within each group into five folds. The validation set is constructed by selecting one fold from each group and is employed for model selection, while the remaining four folds within each group are gathered to form the training set. This process is repeated five times sequentially, resulting in the creation of the five-fold cross-validation dataset. Moreover, for each running, we evaluate the prediction accuracy of our method on an independent unseen test set and report the averaged results.
Hardware Specification No The paper does not provide specific details about the hardware used, such as CPU or GPU models, processor types, or memory specifications. It only mentions general computing environments like "CPU and GPU" in the context of a third-party library.
Software Dependencies No The paper mentions "Signatory" and the "Amber software suite" with "Ff99SB and gaff force fields" but does not specify version numbers for these software components. It also mentions "Adam" optimizer but no version.
Experiment Setup Yes The coefficients of loss terms are set as λ1 = 1, λ2 = 0.01, and λ3 = 10. We optimize our model using Adam with an initial learning rate of 5e-4 and a weight decay of 1e-4. We set the scaling parameters as λ1 = 1, λ2 = 0.01, and λ3 = 10. The model is trained for 200 epochs with an initial learning rate 5e-5 and a weight decay of 1e-5.