reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

DiCA: Disambiguated Contrastive Alignment for Cross-Modal Retrieval with Partial Labels

Authors: Chao Su, Huiming Zheng, Dezhong Peng, Xu Wang

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on four benchmarks validate the effectiveness of our proposed method, which demonstrates enhanced performance over existing state-of-the-art methods.
Researcher Affiliation	Collaboration	1The College of Computer Science, Sichuan University, Chengdu, China 2Sichuan National Innovation New Vision UHD Video Technology Co., Ltd., Chengdu, China EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode	No	The paper describes the methodology using textual explanations and mathematical equations, but it does not include a clearly labeled pseudocode block or algorithm section.
Open Source Code	Yes	Code https://github.com/Rose-bud/Di CA.
Open Datasets	Yes	To evaluate the effectiveness of our method, we conduct extensive comparison experiments on four cross-modal retrieval benchmark datasets. These datasets are introduced as follows: 1) Wikipedia contains 2,866 image-text pairs... 2) INRIA-Websearch consists of 71,478 images and 71,478 text descriptions... 3) NUS-WIDE consists of about 270,000 images... 4) XMedia Net is a large-scale multimodal dataset...
Dataset Splits	Yes	1) Wikipedia contains 2,866 image-text pairs that belong to 10 classes. We follow the previous work (Feng, Wang, and Li 2014) divide the dataset into 3 subsets: 2,173, 231, and 462 pairs for training, validation, and testing sets, respectively. 2) INRIA-Websearch... divide the dataset into three subsets: 9,000, 1,332 and 4,366 image-text pairs for training, validation and testing sets, respectively. 3) NUS-WIDE... split the dataset into three subsets, i.e., 42,941; 5,000; and 23,661 image-text pairs for training, validation, and testing sets, respectively. 4) XMedia Net... divide them into 32,000, 4,000, and 4,000 pairs for training, validation, and testing sets, respectively.
Hardware Specification	Yes	Our Di CA is implemented on the Py Torch framework, and all experiments are conducted on four Nvidia Ge Force RTX 3090 GPUs.
Software Dependencies	No	The paper mentions 'Py Torch framework', 'Adam' optimizer, 'VGG-19', 'Doc2Vec', 'Alex Net', and 'LDA' as models or tools used, but specific version numbers for these software components are not provided.
Experiment Setup	Yes	In this work, we adopt the Adam (Kingma and Ba 2014) optimizer with a learning rate 0.0001 to update the parameters. For all datasets, we set the maximum number of training epochs to 100. The training batch size is set to 32 for Wikipedia dataset, and to 512 for the other datasets. Furthermore, to maintain consistency, the batch size during validation and testing is uniformly set to 256.