Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]

MASS: Overcoming Language Bias in Image-Text Matching

Authors: Jiwan Chung, Seungwon Lim, Sangkyu Lee, Youngjae Yu

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments have shown that MASS effectively lessens language bias without losing an understanding of linguistic compositionality. Overall, MASS offers a promising solution for enhancing image-text matching performance in visual-language models.
Researcher Affiliation Academia Yonsei University, 50 Yonsei-ro, Seodaemun-gu, Seoul, South Korea
Pseudocode No The paper describes methods using mathematical equations and text, but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes code https://github.com/Jiwan Chung/mass aaai
Open Datasets Yes The Natural Color Dataset (NCD) (Anwar et al. 2020) is a dataset of various fruits colored either in the natural color or in gray. We adopt the counting benchmark in VALSE dataset (Parcalabescu et al. 2022). We evaluate our method in both text-to-image and image-to-text retrieval tasks using the test split of MS COCO dataset (Chen et al. 2015). The Winoground benchmark (Thrush et al. 2022) evaluates a VL model s capability to understand compositionality. SVO-Probes (Hendricks and Nematzadeh 2021) is another benchmark testing VL models sensitivity to linguistic alterations.
Dataset Splits Yes We use the Karpathy split (Karpathy and Fei-Fei 2015) with 5000 test images.
Hardware Specification No The paper does not explicitly describe the specific hardware (e.g., GPU models, CPU types, or cloud configurations) used to run its experiments.
Software Dependencies No The paper does not provide specific software dependencies, such as programming language versions or library versions, used to replicate the experiments.
Experiment Setup No The paper states that its proposed method, MASS, 'does not require any additional training' and is an 'inference-time framework,' therefore it does not provide hyperparameters or training configurations for its own methodology. While evaluation protocols are detailed, specific training settings for the models used are not provided.