Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]
Reducing Predictive Feature Suppression in Resource-Constrained Contrastive Image-Caption Retrieval
Authors: Maurits Bleeker, Andrew Yates, Maarten de Rijke
TMLR 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments show that, unlike reconstructing the input caption in the input space, LTD reduces predictive feature suppression, measured by obtaining higher recall@k, r-precision, and n DCG scores than a contrastive ICR baseline. |
| Researcher Affiliation | Academia | Maurits Bleeker EMAIL University of Amsterdam Andrew Yates EMAIL University of Amsterdam Maarten de Rijke EMAIL University of Amsterdam |
| Pseudocode | No | The paper describes the Latent Target Decoding (LTD) method with equations and descriptive text, but it does not include a distinct block labeled "Pseudocode" or "Algorithm" with structured steps. |
| Open Source Code | Yes | To facilitate reproducibility and further research of our work, we include the code with our paper.2 1https://huggingface.co/sentence-transformers/all-mpnet-base-v2 2https://github.com/MauritsBleeker/reducing-predictive-feature-suppression/ |
| Open Datasets | Yes | For training and evaluating our ICR method, we use the two common ICR benchmark datasets: Flickr30k (F30k) (Young et al., 2014) and MS-COCO captions (COCO) (Lin et al., 2014). [...] We also use the crisscrossed captions (Cx C) dataset, which extends the COCO validation and test set with additional annotations of similar captions and images (Parekh et al., 2020)... |
| Dataset Splits | Yes | The F30k dataset contains 31,000 image-caption tuples. We use the train, validate and test split from (Karpathy & Fei-Fei, 2015), with 29,000 images for training, 1,000 for validation, and 1,000 for testing. COCO consists of 123,287 image-caption tuples. We use the train, validate and test split from (Karpathy & Fei-Fei, 2015); we do not use the 1k test setup. |
| Hardware Specification | No | The paper mentions that experiments are run "on a single GPU", but does not specify the exact model of the GPU (e.g., NVIDIA A100, RTX 2080 Ti), CPU, or other detailed hardware specifications. |
| Software Dependencies | No | The paper mentions specific model implementations like "Hugging Face all-Mini LM-L6-v2 Sentence-BERT implementation" and "BERT (Devlin et al., 2018)", but does not provide specific programming language or machine learning framework versions (e.g., Python 3.8, PyTorch 1.9) required to replicate the experiment. |
| Experiment Setup | Yes | Similar to (Chun et al., 2021), we use 30 warm-up and 30 fine-tune epochs, a batch size of 128, and a cosine annealing learning rate schedule with an initial learning rate of 2e-4. The Lagrange multiplier is initialized with a value of 1, bounded between 0 and 100, and is optimized by stochastic gradient ascent with a fixed learning rate of 5e-3 and a momentum (to prevent λ from fluctuating too much) and dampening value of α = 0.9. When we use Ldual, we set β to 1. For the Info NCE loss, we use a temperature value τ of 0.05. [...] For the reconstruction constraint bound η, we try for all experiments several values, η {0.05, 0.1, 0.15, 0.2, 0.25, 0.3}. When we apply ITD we use η {0.5, 1, 2, 3, 4, 5, 6}. |