reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Drug-TTA: Test-Time Adaptation for Drug Virtual Screening via Multi-task Meta-Auxiliary Learning

Authors: Ao Shen, Mingzhi Yuan, Yingfan Ma, Jie Du, Qiao Huang, Manning Wang

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate that Drug-TTA achieves state-of-the-art (SOTA) performance in all five virtual screening tasks under a zero-shot setting, showing an average improvement of 9.86% in AUROC metric compared to the baseline without test-time adaptation. The code is available at https://github.com/ Shen Ao AO/Drug-TTA.git.
Researcher Affiliation	Academia	1Digital Medical Research Center, School of Basic Medical Sciences, Fudan University, 131 Dong an Road, 200032, Shanghai, China 2Shanghai Key Laboratory of Medical Image Computing and Computer Assisted Intervention, Fudan University, 131 Dong an Road, 200032, Shanghai, China. Correspondence to: Manning Wang <EMAIL>.
Pseudocode	Yes	Pseudo code for the training and the testing process are listed in Appendix C. Algorithm 1 Training stage, Algorithm 2 Testing stage
Open Source Code	Yes	The code is available at https://github.com/ Shen Ao AO/Drug-TTA.git.
Open Datasets	Yes	To evaluate the performance of our method, we first assess the zero-shot performance of Drug-TTA on five virtual screening benchmarks: DUD-E (Mysinger et al., 2012), LIT-PCBA (Tran-Nguyen et al., 2020), AD (Chen et al., 2019), DEKOIS 2.0 (Bauer et al., 2013), and CASF-2016 (Su et al., 2018)
Dataset Splits	No	The paper describes the composition of the benchmark datasets and the 'zero-shot' evaluation strategy, where test benchmarks are excluded from the training dataset and the model is retrained. For example, 'DUD-E contains 102 protein pockets and 22,886 active molecules, with an average of 224 active molecules per pocket. Each active molecule corresponds to 50 decoys'. However, it does not explicitly provide specific training/validation/test splits (e.g., percentages or exact counts) for any single dataset used for its main model training.
Hardware Specification	Yes	In the training phase, we optimize the primary task using the Adam W optimizer with a learning rate of 1e-3 and a batch size of 48, with acceleration provided by an NVIDIA A40 GPU. We conduct additional experiments comparing the memory cost and inference time of Drug-TTA and Drug CLIP under the same conditions (i.e., on an RTX 3090 GPU with a batch size of 64).
Software Dependencies	No	The paper mentions optimizers like 'Adam W optimizer' and 'SGD optimizer' and references libraries like 'Uni-Mol' and 'Drug CLIP', but it does not specify version numbers for any software, libraries, or frameworks used (e.g., Python, PyTorch, CUDA).
Experiment Setup	Yes	In the training phase, we optimize the primary task using the Adam W optimizer with a learning rate of 1e-3 and a batch size of 48... For optimizing the auxiliary branch, we use the SGD optimizer, setting the learning rate for the molecule branch at 1e-3 and the pocket branch at 1e-4. During inference, we update only the auxiliary branch... the learning rate for the molecule branch is set at 0.005, while the pocket branch s learning rate is 0.0001, and the batch size is increased to 64. The hyperparameter settings for auxiliary branch model are shown in Table 5.