DrugHash: Hashing Based Contrastive Learning for Virtual Screening
Authors: Jin Han, Yun Hong, Wu-Jun Li
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results show that Drug Hash can outperform existing methods to achieve state-of-the-art accuracy, with at least a 32 reduction in memory cost and a 4.6 improvement in speed. |
| Researcher Affiliation | Academia | Jin Han1*, Yun Hong2*, Wu-Jun Li1 1 National Key Laboratory for Novel Software Technology, School of Computer Science, Nanjing University 2 Kuang Yaming Honors School, Nanjing University EMAIL, EMAIL |
| Pseudocode | No | The paper describes the proposed method, training, and inference steps using mathematical equations and textual descriptions, but does not include a clearly labeled pseudocode or algorithm block. |
| Open Source Code | No | The paper does not contain an explicit statement about the release of source code or a link to a code repository for the methodology described. |
| Open Datasets | Yes | To train Drug Hash, we adopt the same training datasets as Drug CLIP, which is the PDBBind dataset (Wang et al. 2005) argumented by Homo Aug (Gao et al. 2023). To benchmark the VS performance of different methods, we adopt two evaluation datasets, which are DUD-E (Mysinger et al. 2012) and LIT-PCBA (Tran-Nguyen, Jacquemard, and Rognan 2020). To evaluate the memory and time cost of different VS methods, we adopt the ZINC database (Irwin et al. 2020) and the Enamine REAL database (Shivanyuk et al. 2007). The CASF-2016 dataset (Su et al. 2018) is used as the validation set. |
| Dataset Splits | Yes | To train Drug Hash, we adopt the same training datasets as Drug CLIP, which is the PDBBind dataset (Wang et al. 2005) argumented by Homo Aug (Gao et et al. 2023). To benchmark the VS performance of different methods, we adopt two evaluation datasets, which are DUD-E (Mysinger et al. 2012) and LIT-PCBA (Tran-Nguyen, Jacquemard, and Rognan 2020). The CASF-2016 dataset (Su et al. 2018) is used as the validation set to select the best number of epoch. |
| Hardware Specification | Yes | Our model is trained on NVIDIA RTX A6000 GPUs, and each model is trained up to 200 epochs. The time test is running on the Intel Xeon Gold 6240R CPUs. |
| Software Dependencies | No | The paper mentions Faiss (Douze et al. 2024) and general deep learning concepts but does not specify version numbers for key software libraries or frameworks (e.g., Python, PyTorch/TensorFlow versions). |
| Experiment Setup | Yes | In our implementation, we set the hyperparameter λ to 0.2. The temperature coefficient τ is set to 0.07. Each time, we sample 48 protein-molecule pairs for contrastive learning. The code length of the output binary hash codes is 128. Our model is trained on NVIDIA RTX A6000 GPUs, and each model is trained up to 200 epochs. The model is trained for five random seeds and we report the average results. We utilize gradient accumulation, performing gradient backpropagation every four steps on a single GPU card, which is equivalent to using four cards for distributed training. |