reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Large Language-Geometry Model: When LLM meets Equivariance

Authors: Zongzhao Li, Jiacheng Cen, Bing Su, Tingyang Xu, Yu Rong, Deli Zhao, Wenbing Huang

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results demonstrate that Equi LLM delivers significant improvements over previous methods across molecular dynamics simulation, human motion simulation, and antibody design, highlighting its promising generalizability.
Researcher Affiliation	Collaboration	1Gaoling School of Artificial Intelligence, Renmin University of China 2Beijing Key Laboratory of Research on Large Models and Intelligent Governance 3Engineering Research Center of Next-Generation Intelligent Search and Recommendation, MOE 4DAMO Academy, Alibaba Group, Hangzhou, China 5Hupan Lab, Hangzhou, China.
Pseudocode	No	The paper describes the model architecture and mathematical formulations (e.g., equations for Equivariant Adapter in 3.2) but does not include a distinct block labeled 'Pseudocode' or 'Algorithm'.
Open Source Code	No	The paper mentions reproducing results using 'official implementation' for a baseline method (Geo AB) but does not provide any explicit statement or link to the source code for the proposed Equi LLM framework.
Open Datasets	Yes	In the dynamic simulation task, to demonstrate the broad applicability of our model across varying scales, we conduct experiments on two distinct datasets: the molecular-level MD17 (Chmiela et al., 2017) dataset and the macro-level Human Motion Capture (De la Torre et al., 2009) dataset. Following previous study MEAN (Kong et al., 2022), we selected complete antibody-antigen complexes from the SAb Dab (Dunbar et al., 2014) dataset to construct the training and validation sets. ... For test set, we selected 60 diverse complexes from the RAb D (Adolf-Bryfogle et al., 2018) dataset to evaluate the performance of different methods.
Dataset Splits	Yes	In order to expedite the dynamics simulations, we implement a sampling strategy based on previous research (Huang et al., 2022) to extract a subset of trajectories for the purposes of training, validation, and testing. This approach involves randomly selecting an initial point and then sampling 2 T timestamps. The first T timestamps are utilized as input for the models, while the remaining T timestamps represent the future states that the models need to predict. B.1 Implementation Details on MD17 dataset: The training, validation, and testing sets consist of 500, 2000, and 2000 samples, respectively. B.2 Implementation Details on Motion Capture: For subject #35 (Walk), the dataset comprises 1100 training, 600 validation, and 600 testing trajectories, whereas subject #102 (Basketball) includes 600 training, 300 validation, and 300 testing trajectories.
Hardware Specification	Yes	All computational experiments, encompassing model training, validation and testing phases, are executed on a single NVIDIA A100-80G GPU.
Software Dependencies	No	The paper does not explicitly mention any specific software dependencies or their version numbers (e.g., Python, PyTorch, CUDA versions).
Experiment Setup	Yes	To ensure a fair comparison, all hyperparameters (e.g. learning rate, number of training epochs) are kept consistent across our model and all other baselines. Detailed information can be found in Appendix B.1. Table 6: Hyper-parameters of Equi LLM and other methods. The previous length Tp denotes the length of input sequence, the future length Tf denotes the length of output sequence, the time lag t denotes the interval between two timestamps, the hidden size denotes the size of hidden states in all Multi-Layer Perceptrons (MLPs) within the Equi LLM framework, and the layer denotes the number of layers. B.3 Implementation Details on Antibody Design: Equi LLM maintains identical hyper-parameters with MEAN: a 64-dimensional trainable embedding for each amino acid type, 128-dimensional hidden states, 3 network layers, a batch size of 16, and 20 training epochs. The Adam optimizer is employed with an initial learning rate of 0.001, which decays by 5% per epoch.