Statistical Model-driven Similarity Hashing: Bridging Modalities for Efficient Unsupervised Retrieval
Authors: Mingjin Kuai, Jun Long, Zhan Yang
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on three benchmark datasets demonstrate the excellent performance of the SMSH method. |
| Researcher Affiliation | Academia | 1School of Computer Science and Engineering, Central South University, Changsha 410083, China 2Big Data Institute, Central South University, Changsha 410083, China EMAIL |
| Pseudocode | Yes | Algorithm 1: Statistical Model-driven Similarity Hashing |
| Open Source Code | No | The paper does not contain any explicit statement about providing open-source code or a link to a code repository. |
| Open Datasets | Yes | NUS-WIDE (Chua et al. 2009) consists of 269,648 image-text pairs in 81 concepts. MIRFlickr (Huiskes and Lew 2008) consists of 25,000 image-text pairs containing 24 unique concepts. MSCOCO (Lin et al. 2014) contains 123,287 image-text pairs in 80 separate categories. |
| Dataset Splits | Yes | Following (Zhu et al. 2023), we select 186,577 image-text pairs corresponding to the 10 most common concepts, 2,000 image-text pairs are randomly selected as the query, and the rest as a database retrieval set (containing 10,000 training pairs). ... For a fair comparison, we follow the experimental settings of (Zhu et al. 2023) to randomly select 2,000 pairs as the query set and the rest as the database retrieval set (containing 10,000 training pairs). ... Similar to the setup of the previous two datasets, we randomly selected 2,000 image-text pairs as the query and the rest as the database retrieval set (containing 10,000 training pairs). |
| Hardware Specification | No | The paper mentions using Alex Net for feature extraction, but it does not specify any hardware details like GPU or CPU models used for running the experiments. |
| Software Dependencies | No | The paper mentions using Alex Net and Bag of Words (Bo W) methods, and the Adam optimization algorithm. However, it does not provide specific version numbers for any software, libraries, or frameworks used. |
| Experiment Setup | Yes | The visual encoder is a multi-layer perceptron structure(dv 4096 Re LU l) and the decoder is also a multi-layer perceptron structure(l 4096 Re LU dv), where Re LU is the activation function. The structure of the textual encoder and decoder is similar to the vision. We adopt the Adam optimization algorithm (Kingma and Ba 2015) with learning rates set to 0.0001 and 0.0001 on the three datasets. The mini-batch size m is 64. The selection of hyper-parameters. For simplicity, we set φ1 = φ2 = 3 on three datasets. we cross-validate the hyper-parameters ζ, α, β, γ, ξ, and set ζ = 0.8, α = 0.4, β = 0.2, γ = 0.4, ξ = 3 for NUS-WIDE, ζ = 0.6, α = 0.3, β = 0.2, γ = 0.5, ξ = 3 for MIRFlikcr, and ζ = 0.6, α = 0.3, β = 0.3, γ = 0.4, ξ = 1.5 for MSCOCO. Meanwhile, we cross-validate the hyper-parameters ρ and ω, and set ρ = 6, ω = 2 for NUS-WIDE, ρ = 6, ω = 0.5 for MIRFlikcr, and ρ = 4, ω = 2 for MSCOCO. |