SHE: Streaming-media Hashing Retrieval
Authors: Ruitao Pu, Yang Qin, Xiaomin Song, Dezhong Peng, Zhenwen Ren, Yuan Sun
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Comprehensive experiments on four benchmark datasets demonstrate the superiority of our SHE compared to 14 competitors. Extensive experiments of streaming-media retrieval on four widely used multi-modal datasets demonstrate the superiority and effectiveness of our proposed SHE compared with 14 state-of-the-art methods. |
| Researcher Affiliation | Collaboration | 1School of Computer Science, Sichuan University, Chengdu, China 2Sichuan National Innovation New Vision UHD Video Technology Co., Ltd., Chengdu, China 3Tianfu Jincheng Laboratory, Chengdu, China 4Southwest University of Science and Technology, Mianyang, China 5National Key Laboratory of Fundamental Algorithms and Models for Engineering Numerical Simulation, Sichuan University, Chengdu, China. Correspondence to: Yuan Sun <sunyuan EMAIL>. |
| Pseudocode | Yes | Algorithm 1 The training process of our SHE |
| Open Source Code | Yes | https://github.com/perquisite/SHE |
| Open Datasets | Yes | In our experiment, we evaluate the proposed SHE on four widely used multimedia datasets, namely, Wikipedia (Rasiwasia et al., 2010), NUS-WIDE (Chua et al., 2009), XMedia (Peng et al., 2015), and XMedia Net (Peng et al., 2018). |
| Dataset Splits | Yes | In Tab.1, we summarize the statistics of the datasets. Notably, to align with the streaming-media scenario, we gradually incorporate the modalities into the training process in the order they are collected within the datasets. More details about the four datasets are presented in the appendix. Wikipedia is a dataset comprising 2,866 image-text pairs with 10 categories. For our experiment, we randomly allocate 2,173 and 693 pairs for training and testing, respectively, while all pairs are used as the retrieval database. |
| Hardware Specification | Yes | all experiments are conducted on a single Ge Force RTX3090Ti 24GB GPU. |
| Software Dependencies | No | Additionally, the SHE framework is implemented by the Py Torch toolkit. No specific version number for PyTorch is mentioned. |
| Experiment Setup | Yes | In our SHE, all modality-specific sub-networks comprise three fully connected layers, with the Re LU activation function applied after the first two layers and a ℓ2-normalization operation applied to the last layer. Their dimensions are [d 4096 4096 L], where d represents the input feature dimensions of the corresponding modality. For all datasets, we set the batch size nb as 256, the similarity boundary σ as 0.95, the iteration number T as 300, the number of prototype vectors K as 3, and the hyperparameter α as 1. For four datasets, we set the hyperparameter β as 4, 5, 1, and 6, respectively. Algorithm 1 also mentions 'learning rate lr'. |