Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]

Reducing Predictive Feature Suppression in Resource-Constrained Contrastive Image-Caption Retrieval

Authors: Maurits Bleeker, Andrew Yates, Maarten de Rijke

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments show that, unlike reconstructing the input caption in the input space, LTD reduces predictive feature suppression, measured by obtaining higher recall@k, r-precision, and n DCG scores than a contrastive ICR baseline.
Researcher Affiliation Academia Maurits Bleeker EMAIL University of Amsterdam Andrew Yates EMAIL University of Amsterdam Maarten de Rijke EMAIL University of Amsterdam
Pseudocode No The paper describes the Latent Target Decoding (LTD) method with equations and descriptive text, but it does not include a distinct block labeled "Pseudocode" or "Algorithm" with structured steps.
Open Source Code Yes To facilitate reproducibility and further research of our work, we include the code with our paper.2 1https://huggingface.co/sentence-transformers/all-mpnet-base-v2 2https://github.com/MauritsBleeker/reducing-predictive-feature-suppression/
Open Datasets Yes For training and evaluating our ICR method, we use the two common ICR benchmark datasets: Flickr30k (F30k) (Young et al., 2014) and MS-COCO captions (COCO) (Lin et al., 2014). [...] We also use the crisscrossed captions (Cx C) dataset, which extends the COCO validation and test set with additional annotations of similar captions and images (Parekh et al., 2020)...
Dataset Splits Yes The F30k dataset contains 31,000 image-caption tuples. We use the train, validate and test split from (Karpathy & Fei-Fei, 2015), with 29,000 images for training, 1,000 for validation, and 1,000 for testing. COCO consists of 123,287 image-caption tuples. We use the train, validate and test split from (Karpathy & Fei-Fei, 2015); we do not use the 1k test setup.
Hardware Specification No The paper mentions that experiments are run "on a single GPU", but does not specify the exact model of the GPU (e.g., NVIDIA A100, RTX 2080 Ti), CPU, or other detailed hardware specifications.
Software Dependencies No The paper mentions specific model implementations like "Hugging Face all-Mini LM-L6-v2 Sentence-BERT implementation" and "BERT (Devlin et al., 2018)", but does not provide specific programming language or machine learning framework versions (e.g., Python 3.8, PyTorch 1.9) required to replicate the experiment.
Experiment Setup Yes Similar to (Chun et al., 2021), we use 30 warm-up and 30 fine-tune epochs, a batch size of 128, and a cosine annealing learning rate schedule with an initial learning rate of 2e-4. The Lagrange multiplier is initialized with a value of 1, bounded between 0 and 100, and is optimized by stochastic gradient ascent with a fixed learning rate of 5e-3 and a momentum (to prevent λ from fluctuating too much) and dampening value of α = 0.9. When we use Ldual, we set β to 1. For the Info NCE loss, we use a temperature value τ of 0.05. [...] For the reconstruction constraint bound η, we try for all experiments several values, η {0.05, 0.1, 0.15, 0.2, 0.25, 0.3}. When we apply ITD we use η {0.5, 1, 2, 3, 4, 5, 6}.