reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

ProtPainter: Draw or Drag Protein via Topology-guided Diffusion

Authors: Zhengxi Lu, Shizhuo Cheng, Yuru Jiang, Yan Zhang, Min Zhang

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments demonstrate Prot Painter s ability to generate topology-fit (sc TF > 0.8) and designable (sc TM > 0.5) backbones, with drawing and dragging tasks showcasing its flexibility and versatility. In section 4, 'EXPERIMENTS' is detailed, including evaluation on metrics like Designability, Confidence, and Similarity using tables like 'Table 1: Protein Restoration Task on CATH without refolding.' and 'Table 3: Comparison of Similarity and Designability on Protein Restoration Task.'.
Researcher Affiliation	Academia	Zhejiang University EMAIL. All authors are affiliated with Zhejiang University, a university in China.
Pseudocode	Yes	The algorithm is shown in Algorithm 1, and details are shown in Appendix D. The paper presents 'Algorithm 1 Curve Encoder' on page 6.
Open Source Code	Yes	By installing a Chimera X plugin, users can draw curves directly on a protein surface, defining binder conditions. Secondary structure elements for the generated binder can also be assigned. The plugin installation code is available at https://github.com/lll6gg/Chimera X_plugin_binder.
Open Datasets	Yes	PDB (Burley et al., 2023; Berman et al., 2003) with Foldseek (Van Kempen et al., 2024). CATH (Orengo et al., 1997; Sillitoe et al., 2021). We select three representative protein clusters ordered by increasing length and topological complexity: HHH ems, 1a0b cluster, and GPCR. These datasets are either widely recognized public datasets or derived from them, with proper citations.
Dataset Splits	Yes	The dataset is split into a training set and test set in the ratio of 8:2. For EGNN, we set num tokens to 100, dim to 32, and depth to 3.
Hardware Specification	No	Considering the sampling time, the sketching process is efficient, but backbone generation and refolding are time-consuming, taking between 10 seconds and 2 minutes on a single NVIDIA. The hardware is mentioned generally as 'a single NVIDIA' but lacks specific model numbers or detailed specifications.
Software Dependencies	No	The paper mentions tools like 'Protein MPNN (Dauparas et al., 2022)', 'Omegafold (Wu et al., 2022)', and 'Rose TTAFold'. However, it does not provide specific version numbers for these or any other ancillary software components used in the experiments.
Experiment Setup	Yes	Training is done in 100 epochs with cross-entropy loss, Adam optimization, and a learning rate of 0.01. For EGNN, we set num tokens to 100, dim to 32, and depth to 3. For training, we set the learning rate at 0.0001, batch size at 1, and run 2000 epochs with Cross Entropy Loss, Adam optimization, and a learning rate of 0.0001. We set λ = 2/3, γ = 0.2, η = 0.7, according to the ablation study G.5. The parameters are set as Ncurve = 50 per dataset, Nbb = 5, and Nseq = 8.