reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

MASS: Overcoming Language Bias in Image-Text Matching

Authors: Jiwan Chung, Seungwon Lim, Sangkyu Lee, Youngjae Yu

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments have shown that MASS effectively lessens language bias without losing an understanding of linguistic compositionality. Overall, MASS offers a promising solution for enhancing image-text matching performance in visual-language models.
Researcher Affiliation	Academia	Yonsei University, 50 Yonsei-ro, Seodaemun-gu, Seoul, South Korea
Pseudocode	No	The paper describes methods using mathematical equations and text, but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	code https://github.com/Jiwan Chung/mass aaai
Open Datasets	Yes	The Natural Color Dataset (NCD) (Anwar et al. 2020) is a dataset of various fruits colored either in the natural color or in gray. We adopt the counting benchmark in VALSE dataset (Parcalabescu et al. 2022). We evaluate our method in both text-to-image and image-to-text retrieval tasks using the test split of MS COCO dataset (Chen et al. 2015). The Winoground benchmark (Thrush et al. 2022) evaluates a VL model s capability to understand compositionality. SVO-Probes (Hendricks and Nematzadeh 2021) is another benchmark testing VL models sensitivity to linguistic alterations.
Dataset Splits	Yes	We use the Karpathy split (Karpathy and Fei-Fei 2015) with 5000 test images.
Hardware Specification	No	The paper does not explicitly describe the specific hardware (e.g., GPU models, CPU types, or cloud configurations) used to run its experiments.
Software Dependencies	No	The paper does not provide specific software dependencies, such as programming language versions or library versions, used to replicate the experiments.
Experiment Setup	No	The paper states that its proposed method, MASS, 'does not require any additional training' and is an 'inference-time framework,' therefore it does not provide hyperparameters or training configurations for its own methodology. While evaluation protocols are detailed, specific training settings for the models used are not provided.