Deep Signature: Characterization of Large-Scale Molecular Dynamics
Authors: Tiexin Qin, Mengxu ZHU, Chunyang Li, Terry Lyons, Hong Yan, Haoliang Li
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Furthermore, experimental results on three benchmarks of biological processes verify that our approach can achieve superior performance compared to baseline methods. |
| Researcher Affiliation | Collaboration | City University of Hong Kong1 & Chengdu Institute of Biological Products co. Ltd2 & University of Oxford3 |
| Pseudocode | No | The paper describes the methodology using textual descriptions and figures (Fig. 1, Fig. 2, Fig. 3) but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper refers to a third-party Python package, Signatory, used for computations: "implemented based on Signatory1, a Python package that facilitates differentiable computations of the signature and log-signature transforms on both CPU and GPU. 1https://github.com/patrick-kidger/signatory". However, it does not provide an explicit statement or link for the source code of the Deep Signature methodology described in this paper. |
| Open Datasets | Yes | We begin with a synthetic dataset that reports gene regulatory dynamics (Gao et al., 2016). We then assess the performance on two large-scale MD simulation datasets, including epidermal growth factor receptor (EGFR) mutant dynamics (Zhu et al., 2021) and G protein-coupled receptors (GPCR) dynamics (Rodr ıguez-Espigares et al., 2020). More details on dataset construction can be found in Appendix B. We download our data from the GPCRmd (http://gpcrmd.org/) (Rodr ıguez Espigares et al., 2020) database |
| Dataset Splits | Yes | We employ a five-fold cross-validation strategy for training our model. In contrast to conventional random division, we take the temporal nature of the trajectory data into consideration for data partition. Specifically, each trajectory is divided into 5 groups with the same time interval according to its temporal order. Subsequently, we further partition the data within each group into five folds. The validation set is constructed by selecting one fold from each group and is employed for model selection, while the remaining four folds within each group are gathered to form the training set. This process is repeated five times sequentially, resulting in the creation of the five-fold cross-validation dataset. Moreover, for each running, we evaluate the prediction accuracy of our method on an independent unseen test set and report the averaged results. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used, such as CPU or GPU models, processor types, or memory specifications. It only mentions general computing environments like "CPU and GPU" in the context of a third-party library. |
| Software Dependencies | No | The paper mentions "Signatory" and the "Amber software suite" with "Ff99SB and gaff force fields" but does not specify version numbers for these software components. It also mentions "Adam" optimizer but no version. |
| Experiment Setup | Yes | The coefficients of loss terms are set as λ1 = 1, λ2 = 0.01, and λ3 = 10. We optimize our model using Adam with an initial learning rate of 5e-4 and a weight decay of 1e-4. We set the scaling parameters as λ1 = 1, λ2 = 0.01, and λ3 = 10. The model is trained for 200 epochs with an initial learning rate 5e-5 and a weight decay of 1e-5. |