Deep Unsupervised Hashing via External Guidance
Authors: Qihong Song, Xiting Liu, Hongyuan Zhu, Joey Tianyi Zhou, Xi Peng, Peng Hu
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on four benchmark datasets demonstrate that our DUH-EG remarkably outperforms existing state-of-the-art hashing methods. |
| Researcher Affiliation | Academia | 1The College of Computer Science, Sichuan University Chengdu 610065, China 2The State Key Laboratory of Integrated Services Network, Xidian University, Xi an 710071, China 3Georgia Institute of Technology, USA 4I2R & CFAR, A*STAR 5Centre for Frontier AI Research (CFAR), Agency for Science, Technology and Research (A*STAR), Singapore 6 Institute of High Performance Computing, Agency for Science, Technology and Research (A*STAR), Singapore 7National Key Laboratory of Fundamental Algorithms and Models for Engineering Numerical Simulation, Sichuan University, China. |
| Pseudocode | Yes | To present the method flow introduced in Section 3 more clearly and systematically, we provide a detailed description of the DUH-EG learning process in Algorithm 1 and Algorithm 2. In Algorithm 1 we detail the construction of external features for each image, while Algorithm 2 illustrates the Bidirectional Contrastive Learning process. |
| Open Source Code | Yes | The code is available at: https://github.com/XLearning SCU/2025-ICML-DUHEG |
| Open Datasets | Yes | CIFAR-10 (Krizhevsky & Hinton, 2009) Dataset contains 60,000 photographs divided into ten categories, with 6,000 images in each class. NUS-WIDE (Chua et al., 2009) Dataset includes 269,648 photos in 81 different categories. Flickr25k (Huiskes & Lew, 2008) 25,000 multi-label photos with 24 classes are included in the dataset. MSCOCO (Lin et al., 2014) 123,287 samples are contained in the dataset |
| Dataset Splits | Yes | CIFAR-10 (Krizhevsky & Hinton, 2009) Dataset contains 60,000 photographs divided into ten categories, with 6,000 images in each class. Following the protocol of previous studies (Qiu et al., 2021; Wang et al., 2022; Qiu et al., 2024), we randomly pick 1,000 photographs from each class as the query set, for a total of 10,000 images. We take the remaining images as the retrieval set and randomly choose 500 images per class as the training set from the retrieval set. |
| Hardware Specification | No | No specific hardware details (like GPU/CPU models, memory amounts) are mentioned in the paper. The paper only discusses using VGG-16 and CLIP models as backbones without specifying the hardware they ran on. |
| Software Dependencies | No | The paper mentions using specific models like 'VGG-16 network' and 'CLIP (Vi T-B/16) model' as backbones, but it does not provide specific version numbers for ancillary software dependencies such as deep learning frameworks (e.g., PyTorch, TensorFlow) or CUDA versions. |
| Experiment Setup | Yes | In our training procedure, we fixed lrmin = 1 10 5, lrmax = 1 10 4, Tmax = 60, Twarm = 10, Nk = 800, T1 = 0.9 and T2 = 0.97. The hashing layer of Hv consists of a two-layer multi-layer perceptron (MLP) with dimensions [F 512 L], where F represents the output dimension of the pre-trained backbones, while L denotes the length of the generated hash codes, selected from {16, 32, 64}. |