Deep Unsupervised Hashing via External Guidance

Authors: Qihong Song, Xiting Liu, Hongyuan Zhu, Joey Tianyi Zhou, Xi Peng, Peng Hu

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on four benchmark datasets demonstrate that our DUH-EG remarkably outperforms existing state-of-the-art hashing methods.
Researcher Affiliation Academia 1The College of Computer Science, Sichuan University Chengdu 610065, China 2The State Key Laboratory of Integrated Services Network, Xidian University, Xi an 710071, China 3Georgia Institute of Technology, USA 4I2R & CFAR, A*STAR 5Centre for Frontier AI Research (CFAR), Agency for Science, Technology and Research (A*STAR), Singapore 6 Institute of High Performance Computing, Agency for Science, Technology and Research (A*STAR), Singapore 7National Key Laboratory of Fundamental Algorithms and Models for Engineering Numerical Simulation, Sichuan University, China.
Pseudocode Yes To present the method flow introduced in Section 3 more clearly and systematically, we provide a detailed description of the DUH-EG learning process in Algorithm 1 and Algorithm 2. In Algorithm 1 we detail the construction of external features for each image, while Algorithm 2 illustrates the Bidirectional Contrastive Learning process.
Open Source Code Yes The code is available at: https://github.com/XLearning SCU/2025-ICML-DUHEG
Open Datasets Yes CIFAR-10 (Krizhevsky & Hinton, 2009) Dataset contains 60,000 photographs divided into ten categories, with 6,000 images in each class. NUS-WIDE (Chua et al., 2009) Dataset includes 269,648 photos in 81 different categories. Flickr25k (Huiskes & Lew, 2008) 25,000 multi-label photos with 24 classes are included in the dataset. MSCOCO (Lin et al., 2014) 123,287 samples are contained in the dataset
Dataset Splits Yes CIFAR-10 (Krizhevsky & Hinton, 2009) Dataset contains 60,000 photographs divided into ten categories, with 6,000 images in each class. Following the protocol of previous studies (Qiu et al., 2021; Wang et al., 2022; Qiu et al., 2024), we randomly pick 1,000 photographs from each class as the query set, for a total of 10,000 images. We take the remaining images as the retrieval set and randomly choose 500 images per class as the training set from the retrieval set.
Hardware Specification No No specific hardware details (like GPU/CPU models, memory amounts) are mentioned in the paper. The paper only discusses using VGG-16 and CLIP models as backbones without specifying the hardware they ran on.
Software Dependencies No The paper mentions using specific models like 'VGG-16 network' and 'CLIP (Vi T-B/16) model' as backbones, but it does not provide specific version numbers for ancillary software dependencies such as deep learning frameworks (e.g., PyTorch, TensorFlow) or CUDA versions.
Experiment Setup Yes In our training procedure, we fixed lrmin = 1 10 5, lrmax = 1 10 4, Tmax = 60, Twarm = 10, Nk = 800, T1 = 0.9 and T2 = 0.97. The hashing layer of Hv consists of a two-layer multi-layer perceptron (MLP) with dimensions [F 512 L], where F represents the output dimension of the pre-trained backbones, while L denotes the length of the generated hash codes, selected from {16, 32, 64}.