ProtPainter: Draw or Drag Protein via Topology-guided Diffusion
Authors: Zhengxi Lu, Shizhuo Cheng, Yuru Jiang, Yan Zhang, Min Zhang
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments demonstrate Prot Painter s ability to generate topology-fit (sc TF > 0.8) and designable (sc TM > 0.5) backbones, with drawing and dragging tasks showcasing its flexibility and versatility. In section 4, 'EXPERIMENTS' is detailed, including evaluation on metrics like Designability, Confidence, and Similarity using tables like 'Table 1: Protein Restoration Task on CATH without refolding.' and 'Table 3: Comparison of Similarity and Designability on Protein Restoration Task.'. |
| Researcher Affiliation | Academia | Zhejiang University EMAIL. All authors are affiliated with Zhejiang University, a university in China. |
| Pseudocode | Yes | The algorithm is shown in Algorithm 1, and details are shown in Appendix D. The paper presents 'Algorithm 1 Curve Encoder' on page 6. |
| Open Source Code | Yes | By installing a Chimera X plugin, users can draw curves directly on a protein surface, defining binder conditions. Secondary structure elements for the generated binder can also be assigned. The plugin installation code is available at https://github.com/lll6gg/Chimera X_plugin_binder. |
| Open Datasets | Yes | PDB (Burley et al., 2023; Berman et al., 2003) with Foldseek (Van Kempen et al., 2024). CATH (Orengo et al., 1997; Sillitoe et al., 2021). We select three representative protein clusters ordered by increasing length and topological complexity: HHH ems, 1a0b cluster, and GPCR. These datasets are either widely recognized public datasets or derived from them, with proper citations. |
| Dataset Splits | Yes | The dataset is split into a training set and test set in the ratio of 8:2. For EGNN, we set num tokens to 100, dim to 32, and depth to 3. |
| Hardware Specification | No | Considering the sampling time, the sketching process is efficient, but backbone generation and refolding are time-consuming, taking between 10 seconds and 2 minutes on a single NVIDIA. The hardware is mentioned generally as 'a single NVIDIA' but lacks specific model numbers or detailed specifications. |
| Software Dependencies | No | The paper mentions tools like 'Protein MPNN (Dauparas et al., 2022)', 'Omegafold (Wu et al., 2022)', and 'Rose TTAFold'. However, it does not provide specific version numbers for these or any other ancillary software components used in the experiments. |
| Experiment Setup | Yes | Training is done in 100 epochs with cross-entropy loss, Adam optimization, and a learning rate of 0.01. For EGNN, we set num tokens to 100, dim to 32, and depth to 3. For training, we set the learning rate at 0.0001, batch size at 1, and run 2000 epochs with Cross Entropy Loss, Adam optimization, and a learning rate of 0.0001. We set λ = 2/3, γ = 0.2, η = 0.7, according to the ablation study G.5. The parameters are set as Ncurve = 50 per dataset, Nbb = 5, and Nseq = 8. |