Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]
MASS: Overcoming Language Bias in Image-Text Matching
Authors: Jiwan Chung, Seungwon Lim, Sangkyu Lee, Youngjae Yu
AAAI 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments have shown that MASS effectively lessens language bias without losing an understanding of linguistic compositionality. Overall, MASS offers a promising solution for enhancing image-text matching performance in visual-language models. |
| Researcher Affiliation | Academia | Yonsei University, 50 Yonsei-ro, Seodaemun-gu, Seoul, South Korea |
| Pseudocode | No | The paper describes methods using mathematical equations and text, but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | code https://github.com/Jiwan Chung/mass aaai |
| Open Datasets | Yes | The Natural Color Dataset (NCD) (Anwar et al. 2020) is a dataset of various fruits colored either in the natural color or in gray. We adopt the counting benchmark in VALSE dataset (Parcalabescu et al. 2022). We evaluate our method in both text-to-image and image-to-text retrieval tasks using the test split of MS COCO dataset (Chen et al. 2015). The Winoground benchmark (Thrush et al. 2022) evaluates a VL model s capability to understand compositionality. SVO-Probes (Hendricks and Nematzadeh 2021) is another benchmark testing VL models sensitivity to linguistic alterations. |
| Dataset Splits | Yes | We use the Karpathy split (Karpathy and Fei-Fei 2015) with 5000 test images. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware (e.g., GPU models, CPU types, or cloud configurations) used to run its experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies, such as programming language versions or library versions, used to replicate the experiments. |
| Experiment Setup | No | The paper states that its proposed method, MASS, 'does not require any additional training' and is an 'inference-time framework,' therefore it does not provide hyperparameters or training configurations for its own methodology. While evaluation protocols are detailed, specific training settings for the models used are not provided. |