reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Reproducibility Study of "Languange-Image COnsistency"

Authors: Konrad Szewczyk, Patrik Bartak, Mikhail Vlasenko, Fanmin Shi

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	This report aims to verify the findings and expand upon the evaluation and training methods from the paper LICO: Explainable Models with Language-Image COnsistency. The main claims from the original paper are that LICO (i) enhances interpretability by producing more explainable saliency maps in conjunction with a post-hoc explainability method and (ii) improves image classification performance without computational overhead during inference. We have reproduced the key experiments conducted by Lei et al.; however, the obtained results do not support the original claims. Additionally, we identify a limitation in the paper s evaluation method, which favors non-robust models, and propose robust experimental setups for more comprehensive quantitative analysis. Furthermore, we undertake additional studies on LICO s training methodology to enhance its interpretability. Our code is available at https://github.com/konradszewczyk/lico-reproduction.
Researcher Affiliation	Academia	Patrik Bartak EMAIL Informatics Institute, University of Amsterdam Konrad Szewczyk EMAIL Informatics Institute, University of Amsterdam Mikhail Vlasenko EMAIL Informatics Institute, University of Amsterdam Fanmin Shi EMAIL Informatics Institute, University of Amsterdam
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks for its own methodology. While it discusses the LICO algorithm from a referenced paper, it does not present any explicitly labeled 'Pseudocode' or 'Algorithm' sections for its reproduction or extension work.
Open Source Code	Yes	Our code is available at https://github.com/konradszewczyk/lico-reproduction.
Open Datasets	Yes	We train and evaluate the presented models on two image classification datasets. Following the original experiments, we use CIFAR-100 (Krizhevsky et al., 2009), which provides 50000 training and 10000 validation images divided into 100 classes. Additionally, we use Image Net-S50 (Gao et al., 2022), consisting of 64431 training images and 752 validation images with segmentation masks and bounding box information that we use for extended evaluation.
Dataset Splits	Yes	We train and evaluate the presented models on two image classification datasets. Following the original experiments, we use CIFAR-100 (Krizhevsky et al., 2009), which provides 50000 training and 10000 validation images divided into 100 classes. Additionally, we use Image Net-S50 (Gao et al., 2022), consisting of 64431 training images and 752 validation images with segmentation masks and bounding box information that we use for extended evaluation.
Hardware Specification	Yes	For the experiments, we use 2 machines with the following GPUs: NVIDIA Ge Force RTX 4090 (Machine 1), and NVIDIA A100-SXM440GB (Machine 2).
Software Dependencies	No	To reduce the amount of code needed for the implementation, and to increase readability, we use the Py Torch Lightning framework (Falcon & The Py Torch Lightning team, 2019).
Experiment Setup	Yes	We use the original values for the hyperparameters that were specified by Lei et al. (2023): SGD optimizer with learning rate = 0.03, momentum = 0.9, weight decay = 0.0001, and cosine rate decay schedule i.e. η = η0 cos 7πk 16K , where η0 denotes the initial learning rate, k is the index of training step, and K is the total amount of training steps. The LICO-specific parameters are also used unchanged: α = 10, β = 1, and the hidden dimension of the text projection MLP is 512. We use 100 epochs for all tested datasets and the Res Net-18 architecture unless otherwise stated.