Distribution-Consistency-Guided Multi-modal Hashing

Authors: Jin-Yu Liu, Xian-Ling Mao, Tian-Yi Che, Rong-Cheng Tu

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on three widely used datasets demonstrate the superiority of the proposed method compared to state-of-the-art baselines in multi-modal retrieval tasks. Experimental results are depicted in Figure 3, from which we can obtain the following observations: (1) Our proposed method DCGMH consistently achieves the best retrieval performance across nearly all noisy label ratios, which demonstrates that DCGMH effectively mitigates the negative impact of noisy labels in the training set and showcases its exceptional robustness. (2) As the noisy label ratio increases, DCGMH shows the slowest performance decline compared to other models, indicating its lower sensitivity to noisy labels.
Researcher Affiliation Academia Jin-Yu Liu, Xian-Ling Mao*, Tian-Yi Che, Rong-Cheng Tu School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China EMAIL, EMAIL
Pseudocode No The paper describes the method using mathematical formulations and descriptive text, but does not include an explicit 'Pseudocode' or 'Algorithm' block.
Open Source Code Yes Code https://github.com/LiuJinyu1229/DCGMH
Open Datasets Yes To validate this hypothesis, we conduct a Box Plot statistical analysis comparing the average similarity scores between hash codes and their respective in-category and outcategory centers. This analysis is performed on the MIR Flickr dataset which contains a training set of 5,000 instances with a noisy label ratio of 40%. Extensive experiments on three benchmark datasets demonstrate the proposed method outperforms the state-of-the-art baselines in multi-modal retrieval tasks. To simulate the model s retrieval performance in realworld noisy label scenarios, we conduct experiments on three datasets assuming 40% noisy labels in the training set, with the MAP and PR curve comparison with all baselines shown in Table 1 and Figure 2. Taking the MIR Flickr dataset as an example, we explore the impact of different values of the hyperparameters α, β, γ, and η on model performance with the hash code length of 64 bits and the noisy label ratio of 40%. Similar trends are observed on the NUS-WIDE and COCO, where the corresponding hyperparameters are set to 1.5, 0.05, 5, 1, and 1.2, 0.2, 5, 1, respectively.
Dataset Splits No The paper mentions a "training set of 5,000 instances with a noisy label ratio of 40%" for a specific analysis on the MIR Flickr dataset, and generally discusses a "training set" for experiments with varying noisy label ratios. However, it does not provide explicit train/test/validation split percentages, counts, or references to predefined standard splits for the main experiments conducted on MIR Flickr, NUS-WIDE, and MS COCO.
Hardware Specification Yes During the training of the hashing network on a single NVIDIA RTX 3090Ti GPU, the SGD optimizer with a batch size of 48 is adopted for parameter optimization, with an initial learning rate set to 0.005, 0.001, and 0.01 for the three datasets, respectively.
Software Dependencies No The paper mentions using a 'bag-of-words (Bo W) model', 'VGG model', 'multi-layer perceptions (MLP)', and 'SGD optimizer' but does not specify any software libraries or frameworks with their version numbers (e.g., PyTorch, TensorFlow, scikit-learn versions).
Experiment Setup Yes Similar to DIOR, we first perform warm-up training without noisy label filtering and correction to allow the model to learn the basic hashing mapping capabilities, with the warm-up epochs set to 5, 5, and 30 for the MIR Flickr, NUS-WIDE, and MS COCO respectively. During the training of the hashing network on a single NVIDIA RTX 3090Ti GPU, the SGD optimizer with a batch size of 48 is adopted for parameter optimization, with an initial learning rate set to 0.005, 0.001, and 0.01 for the three datasets, respectively. Regarding hyper-parameters, α and β are set to 1, 1.5, 1.2 and 0.15, 0.05, 0.2 for the three datasets, respectively. γ and η are empirically set to 5 and 1 for all three datasets, respectively, which will be discussed later.