reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Discovering Clone Negatives via Adaptive Contrastive Learning for Image-Text Matching

Authors: Renjie Pan, Jihao Dong, Hua Yang

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments across several tasks demonstrate the effectiveness of Ada CL in image-text matching. Furthermore, we extend Ada CL to weakly-supervised image-text matching by replacing human-annotated descriptions with automatically generated captions, thereby increasing the number of potential clone negatives. Ada CL maintains robustness in this setting, alleviating the reliance on crowd-sourced annotations and laying a foundation for scalable vision-language contrastive learning.
Researcher Affiliation	Academia	Renjie Pan, Jihao Dong, Hua Yang Institute of Image Communication and Network Engineering, Shanghai Jiao Tong University Shanghai Key Lab of Digital Media Processing and Transmission, Shanghai Jiao Tong University EMAIL
Pseudocode	Yes	Algorithm 1 Adaptive Contrastive Learning Input: a mini-batch of N image-text pairs, with N positives and N M negatives. Output: Lada 1: for each mini-batch do 2: Select in-batch grounding salient negatives and clone negatives; Ssln = {si+j} \| j M, i+ = argmaxi Salient Scorei, Scln = {si j} \| j M, i = argmini Salient Scorei, 3: Sort out in-batch potential clone negatives S , and select anchor based on S; p (C \| s) = 1/{1 + π cπc σc σ c exp h (s µc)22σ2c (s µ c)2 S := {s \| p (C \| s) > p C \| s }, anchor := spos \| δ = median( S), 4: Obtain the probability of anchor for tuning; ˆpu = exp[m1(anchor m2)] exp[m1(anchor m2)]+P 5: Compute m1 and m2 according to Eq. 4 and Eq. 6; m1 = log( ϵ ˆpu (1 ϵ)(1 ˆpu))/(anchor 1), m2 = anchor + log( 1 ˆpu ˆpu P anchor )/m1; 6: Update ˆpi(I) and Lada; ˆpi(I) = exp[m1(s(I,Ti) m2)] exp[m1(s(I,Ti) m2)]+PM+1 j=1,j =i exp[s(I,Tj)], Lada = EI D [H(y(I), ˆp(I))].
Open Source Code	No	The paper does not provide a direct link to a code repository or an explicit statement about releasing the source code for their proposed method. It mentions releasing 'datasets based on pseudo captions' but not the code.
Open Datasets	Yes	We evaluate Ada CL on two image-text matching datasets, (1) Flickr30K(Young et al., 2014) consists of 31,783 images, with a training/test/validation split of 29,783/1,000/1,000. (2) MS-COCO(Lin et al., 2014) consists of 123,287 images, with a training/test/validation split of 113,287/5,000/5,000.
Dataset Splits	Yes	Datasets. We evaluate Ada CL on two image-text matching datasets, (1) Flickr30K(Young et al., 2014) consists of 31,783 images, with a training/test/validation split of 29,783/1,000/1,000. (2) MS-COCO(Lin et al., 2014) consists of 123,287 images, with a training/test/validation split of 113,287/5,000/5,000. The test sets are divided into MS-COCO 5-fold 1K (average results of 5 test sets) and MS-COCO 5K (results of 5000 test images).
Hardware Specification	Yes	All experiments are performed on four NVIDIA Tesla V100s.
Software Dependencies	No	The paper mentions several frameworks and tools used (Res Net, Bi GRU, Faster R-CNN, BERT, CLIP, Adam optimizer, BLIP, GIT, BLIP-2, Co Ca) but does not provide specific version numbers for the software dependencies like Python, PyTorch, or TensorFlow that would be required for full reproducibility.
Experiment Setup	Yes	Training Details. All experiments are performed on four NVIDIA Tesla V100s. For image-text matching, we use a mini-batch size of 64 and the Adam optimizer. The learning rate is 0.0002 and starts decaying 15% of every 10 epochs after epoch 20. The maximum length of each sentence is a = 32. For Faster R-CNN, the region number is n = 36. The dimension of joint embedding space D is set to 256. We follow (He et al., 2020) to use the momentum memory bank, where the momentum coefficient z is set to 0.99, and the size M is 4096. For Ada CL, ˆpu is set to 0.03, and ϵ = e 7. m1 and m2 are initialized to 20 and 0.1 respectively for adaptive tuning.