reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

BiMAC: Bidirectional Multimodal Alignment in Contrastive Learning

Authors: Masoumeh Zareapoor, Pourya Shamsolmoali, Yue Lu

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate the superiority of our model over state-of-the-art models in various vision-language tasks. ... Experimental Results To ensure fairness and consistency in our experiments, we adopt the Co Ca framework as the base architecture...
Researcher Affiliation	Academia	1 Shanghai Jiaotong University, Shanghai, China 2 East China Normal University, Shanghai, China 3 University of York, York, United Kingdom
Pseudocode	Yes	Algorithm 1: Bi MAC: Bidirectional Multimodal Alignment
Open Source Code	No	The paper does not provide any explicit statements about releasing source code or a link to a code repository.
Open Datasets	Yes	The pretraining process uses two versions of conceptual captions datasets, i.e., CC3M (Sharma et al. 2018) and CC12M (Changpinyo et al. 2021) (together denoted as CC15M), which, after filtering out invalid URLs, consists of 13M image-text pairs. ... The models are trained to find the most relevant sample corresponding to a specific input across different modalities, using the Flickr30K and MSCOCO datasets. ... We fine-tune the models on the COCO Captions dataset (Lin et al. 2014) and evaluate their performance using metrics: BLEU@4, METEOR, CIDEr, and SPICE. We further evaluate the models on the No Caps dataset (Agrawal et al. 2019) in a zero-shot setting...
Dataset Splits	Yes	The models are trained to find the most relevant sample corresponding to a specific input across different modalities, using the Flickr30K (1K test set) and MSCOCO (5K test set) datasets. ... Following (Wang et al. 2023b), we fine-tune the models on the COCO Captions dataset (Lin et al. 2014) and evaluate their performance using metrics: BLEU@4, METEOR, CIDEr, and SPICE. As shown in Table 2, Bi MAC consistently outperforms all baselines, achieving improvements over Co Ca ranging from 4% to 7%. We further evaluate the models on the No Caps dataset (Agrawal et al. 2019) in a zero-shot setting, without any additional fine-tuning. ... We report BLEU@4, METEOR, CIDEr, and SPICE scores on the Karpathy test split.
Hardware Specification	Yes	The model was trained over 30 epochs using 8 RTX 3090 GPUs, with a batch size of 1024 and an image resolution of 256 × 256.
Software Dependencies	No	The paper mentions "The Adam optimizer" but does not specify any software libraries or frameworks with version numbers (e.g., PyTorch, TensorFlow) or specific Adam version used.
Experiment Setup	Yes	The Adam optimizer with an initial learning rate of 2 × 10−4 combined with a cosine decay schedule. The model was trained over 30 epochs using 8 RTX 3090 GPUs, with a batch size of 1024 and an image resolution of 256 × 256. The coefficients were set to λCG = 1 in line with Co Ca, λGIT = 0.7 and the temperature τ = 0.5.