Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]

Learning with Noisy Correspondence for Cross-modal Matching

Authors: Zhenyu Huang, Guocheng Niu, Xiao Liu, Wenbiao Ding, Xinyan Xiao, Hua Wu, Xi Peng

NeurIPS 2021 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To verify the effectiveness of our method, we conduct experiments by using the image-text matching as a showcase. Extensive experiments on Flickr30K, MS-COCO, and Conceptual Captions verify the effectiveness of our method.
Researcher Affiliation Collaboration Zhenyu Huang College of Computer Science Sichuan University, China EMAIL Guocheng Niu Baidu Inc., China EMAIL Xiao Liu TAL Education Group EMAIL Wenbiao Ding TAL Education Group EMAIL Xinyan Xiao Baidu Inc., China EMAIL Hua Wu Baidu Inc., China EMAIL Xi Peng College of Computer Science Sichuan University, China EMAIL
Pseudocode Yes Algorithm 1: Noisy Correspondence Rectifier
Open Source Code No The code could be accessed from www.pengxi.me. This URL is a general personal website and does not explicitly state that it contains the source code for the methodology or experiments described in the paper, nor is it a direct link to a code repository.
Open Datasets Yes In the experiments, we use three benchmark datasets including Flickr30K [42], MS-COCO [23], and Conceptual Captions [35].
Dataset Splits Yes Flickr30K contains 31,000 images collected from the Flickr website with five captions each. Following [19], we use 1,000 images for validation, 1,000 images for testing, and the rest for training. MS-COCO contains 123,287 images with five captions each. We follow the data partition in [19] which consists of 113,287 training images, 5,000 validation images, and 5,000 test images. In our experiments, we use a subset of Conceptual Captions for evaluation, named CC152K. Specifically, we randomly select 150,000 samples from the training split for training, 1,000 samples from the validation split for validation, and 1,000 samples from the validation split for testing.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts) used for running its experiments.
Software Dependencies No The paper mentions 'Adam optimizer [16]' but does not provide specific version numbers for other key software components (e.g., Python, PyTorch, TensorFlow, CUDA) used for their own implementation.
Experiment Setup Yes We train our network using the Adam optimizer [16] with the default parameters and a batch size of 128. In addition, we fix the margin α = 0.2 and m = 10 for the soft margin through the experiments.