reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Distribution-Consistency-Guided Multi-modal Hashing

Authors: Jin-Yu Liu, Xian-Ling Mao, Tian-Yi Che, Rong-Cheng Tu

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on three widely used datasets demonstrate the superiority of the proposed method compared to state-of-the-art baselines in multi-modal retrieval tasks. Experimental results are depicted in Figure 3, from which we can obtain the following observations: (1) Our proposed method DCGMH consistently achieves the best retrieval performance across nearly all noisy label ratios, which demonstrates that DCGMH effectively mitigates the negative impact of noisy labels in the training set and showcases its exceptional robustness. (2) As the noisy label ratio increases, DCGMH shows the slowest performance decline compared to other models, indicating its lower sensitivity to noisy labels.
Researcher Affiliation	Academia	Jin-Yu Liu, Xian-Ling Mao*, Tian-Yi Che, Rong-Cheng Tu School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China EMAIL, EMAIL
Pseudocode	No	The paper describes the method using mathematical formulations and descriptive text, but does not include an explicit 'Pseudocode' or 'Algorithm' block.
Open Source Code	Yes	Code https://github.com/LiuJinyu1229/DCGMH
Open Datasets	Yes	To validate this hypothesis, we conduct a Box Plot statistical analysis comparing the average similarity scores between hash codes and their respective in-category and outcategory centers. This analysis is performed on the MIR Flickr dataset which contains a training set of 5,000 instances with a noisy label ratio of 40%. Extensive experiments on three benchmark datasets demonstrate the proposed method outperforms the state-of-the-art baselines in multi-modal retrieval tasks. To simulate the model s retrieval performance in realworld noisy label scenarios, we conduct experiments on three datasets assuming 40% noisy labels in the training set, with the MAP and PR curve comparison with all baselines shown in Table 1 and Figure 2. Taking the MIR Flickr dataset as an example, we explore the impact of different values of the hyperparameters α, β, γ, and η on model performance with the hash code length of 64 bits and the noisy label ratio of 40%. Similar trends are observed on the NUS-WIDE and COCO, where the corresponding hyperparameters are set to 1.5, 0.05, 5, 1, and 1.2, 0.2, 5, 1, respectively.
Dataset Splits	No	The paper mentions a "training set of 5,000 instances with a noisy label ratio of 40%" for a specific analysis on the MIR Flickr dataset, and generally discusses a "training set" for experiments with varying noisy label ratios. However, it does not provide explicit train/test/validation split percentages, counts, or references to predefined standard splits for the main experiments conducted on MIR Flickr, NUS-WIDE, and MS COCO.
Hardware Specification	Yes	During the training of the hashing network on a single NVIDIA RTX 3090Ti GPU, the SGD optimizer with a batch size of 48 is adopted for parameter optimization, with an initial learning rate set to 0.005, 0.001, and 0.01 for the three datasets, respectively.
Software Dependencies	No	The paper mentions using a 'bag-of-words (Bo W) model', 'VGG model', 'multi-layer perceptions (MLP)', and 'SGD optimizer' but does not specify any software libraries or frameworks with their version numbers (e.g., PyTorch, TensorFlow, scikit-learn versions).
Experiment Setup	Yes	Similar to DIOR, we first perform warm-up training without noisy label filtering and correction to allow the model to learn the basic hashing mapping capabilities, with the warm-up epochs set to 5, 5, and 30 for the MIR Flickr, NUS-WIDE, and MS COCO respectively. During the training of the hashing network on a single NVIDIA RTX 3090Ti GPU, the SGD optimizer with a batch size of 48 is adopted for parameter optimization, with an initial learning rate set to 0.005, 0.001, and 0.01 for the three datasets, respectively. Regarding hyper-parameters, α and β are set to 1, 1.5, 1.2 and 0.15, 0.05, 0.2 for the three datasets, respectively. γ and η are empirically set to 5 and 1 for all three datasets, respectively, which will be discussed later.