reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Energy-based models for atomic-resolution protein conformations

Authors: Yilun Du, Joshua Meier, Jerry Ma, Rob Fergus, Alexander Rives

ICLR 2020 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To evaluate the model, we benchmark on the rotamer recovery task, the problem of predicting the conformation of a side chain from its context within a protein structure, which has been used to evaluate energy functions for protein design. The model achieves performance close to that of the Rosetta energy function, a state-of-the-art method widely used in protein structure prediction and design. Models were trained for 180 thousand parameter updates using 32 NVIDIA V100 GPUs, a batch size of 16,384, and the Adam optimizer (α = 2 10 4, β1 = 0.99, β2 = 0.999). We evaluated training progress using a held-out 5% subset of the training data as a validation set.
Researcher Affiliation	Collaboration	Yilun Du Massachusetts Institute of Technology Cambridge, MA EMAIL Joshua Meier Facebook AI Research New York, NY EMAIL Jerry Ma Facebook AI Research Menlo Park, CA EMAIL Rob Fergus Facebook AI Research & New York University New York, NY EMAIL Alexander Rives New York University New York, NY EMAIL
Pseudocode	Yes	Algorithm 1 Training Procedure for the EBM
Open Source Code	Yes	Data and code for experiments are available at https://github.com/facebookresearch/ protein-ebm
Open Datasets	Yes	We constructed a curated dataset of high-resolution PDB structures using the Cull PDB database, with the following criteria: resolution ﬁner than 1.8 A; sequence identity less than 90%; and R value less than 0.25 as deﬁned in Wang & R. L. Dunbrack (2003). To test the model on rotamer recovery, we use the test set of structures from Leaver-Fay et al. (2013).
Dataset Splits	Yes	We evaluated training progress using a held-out 5% subset of the training data as a validation set.
Hardware Specification	Yes	Models were trained for 180 thousand parameter updates using 32 NVIDIA V100 GPUs, a batch size of 16,384, and the Adam optimizer (α = 2 10 4, β1 = 0.99, β2 = 0.999).
Software Dependencies	No	The paper mentions using the 'Adam optimizer' but does not provide specific version numbers for any software dependencies.
Experiment Setup	Yes	Models were trained for 180 thousand parameter updates using 32 NVIDIA V100 GPUs, a batch size of 16,384, and the Adam optimizer (α = 2 10 4, β1 = 0.99, β2 = 0.999). For all experiments, we use a 6-layer Transformer with embedding dimension of 256 (split over 8 attention heads) and feed-forward dimension of 1024. The ﬁnal MLP contains 256 hidden units. The models are trained without dropout. Layer normalization (Ba et al., 2016) is applied before the attention blocks.