reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Learning with Noisy Correspondence for Cross-modal Matching

Authors: Zhenyu Huang, Guocheng Niu, Xiao Liu, Wenbiao Ding, Xinyan Xiao, Hua Wu, Xi Peng

NeurIPS 2021 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To verify the effectiveness of our method, we conduct experiments by using the image-text matching as a showcase. Extensive experiments on Flickr30K, MS-COCO, and Conceptual Captions verify the effectiveness of our method.
Researcher Affiliation	Collaboration	Zhenyu Huang College of Computer Science Sichuan University, China EMAIL Guocheng Niu Baidu Inc., China EMAIL Xiao Liu TAL Education Group EMAIL Wenbiao Ding TAL Education Group EMAIL Xinyan Xiao Baidu Inc., China EMAIL Hua Wu Baidu Inc., China EMAIL Xi Peng College of Computer Science Sichuan University, China EMAIL
Pseudocode	Yes	Algorithm 1: Noisy Correspondence Rectiﬁer
Open Source Code	No	The code could be accessed from www.pengxi.me. This URL is a general personal website and does not explicitly state that it contains the source code for the methodology or experiments described in the paper, nor is it a direct link to a code repository.
Open Datasets	Yes	In the experiments, we use three benchmark datasets including Flickr30K [42], MS-COCO [23], and Conceptual Captions [35].
Dataset Splits	Yes	Flickr30K contains 31,000 images collected from the Flickr website with ﬁve captions each. Following [19], we use 1,000 images for validation, 1,000 images for testing, and the rest for training. MS-COCO contains 123,287 images with ﬁve captions each. We follow the data partition in [19] which consists of 113,287 training images, 5,000 validation images, and 5,000 test images. In our experiments, we use a subset of Conceptual Captions for evaluation, named CC152K. Speciﬁcally, we randomly select 150,000 samples from the training split for training, 1,000 samples from the validation split for validation, and 1,000 samples from the validation split for testing.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts) used for running its experiments.
Software Dependencies	No	The paper mentions 'Adam optimizer [16]' but does not provide specific version numbers for other key software components (e.g., Python, PyTorch, TensorFlow, CUDA) used for their own implementation.
Experiment Setup	Yes	We train our network using the Adam optimizer [16] with the default parameters and a batch size of 128. In addition, we ﬁx the margin α = 0.2 and m = 10 for the soft margin through the experiments.