Reproducibility Study of "Languange-Image COnsistency"
Authors: Konrad Szewczyk, Patrik Bartak, Mikhail Vlasenko, Fanmin Shi
TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | This report aims to verify the findings and expand upon the evaluation and training methods from the paper LICO: Explainable Models with Language-Image COnsistency. The main claims from the original paper are that LICO (i) enhances interpretability by producing more explainable saliency maps in conjunction with a post-hoc explainability method and (ii) improves image classification performance without computational overhead during inference. We have reproduced the key experiments conducted by Lei et al.; however, the obtained results do not support the original claims. Additionally, we identify a limitation in the paper s evaluation method, which favors non-robust models, and propose robust experimental setups for more comprehensive quantitative analysis. Furthermore, we undertake additional studies on LICO s training methodology to enhance its interpretability. Our code is available at https://github.com/konradszewczyk/lico-reproduction. |
| Researcher Affiliation | Academia | Patrik Bartak EMAIL Informatics Institute, University of Amsterdam Konrad Szewczyk EMAIL Informatics Institute, University of Amsterdam Mikhail Vlasenko EMAIL Informatics Institute, University of Amsterdam Fanmin Shi EMAIL Informatics Institute, University of Amsterdam |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks for its own methodology. While it discusses the LICO algorithm from a referenced paper, it does not present any explicitly labeled 'Pseudocode' or 'Algorithm' sections for its reproduction or extension work. |
| Open Source Code | Yes | Our code is available at https://github.com/konradszewczyk/lico-reproduction. |
| Open Datasets | Yes | We train and evaluate the presented models on two image classification datasets. Following the original experiments, we use CIFAR-100 (Krizhevsky et al., 2009), which provides 50000 training and 10000 validation images divided into 100 classes. Additionally, we use Image Net-S50 (Gao et al., 2022), consisting of 64431 training images and 752 validation images with segmentation masks and bounding box information that we use for extended evaluation. |
| Dataset Splits | Yes | We train and evaluate the presented models on two image classification datasets. Following the original experiments, we use CIFAR-100 (Krizhevsky et al., 2009), which provides 50000 training and 10000 validation images divided into 100 classes. Additionally, we use Image Net-S50 (Gao et al., 2022), consisting of 64431 training images and 752 validation images with segmentation masks and bounding box information that we use for extended evaluation. |
| Hardware Specification | Yes | For the experiments, we use 2 machines with the following GPUs: NVIDIA Ge Force RTX 4090 (Machine 1), and NVIDIA A100-SXM440GB (Machine 2). |
| Software Dependencies | No | To reduce the amount of code needed for the implementation, and to increase readability, we use the Py Torch Lightning framework (Falcon & The Py Torch Lightning team, 2019). |
| Experiment Setup | Yes | We use the original values for the hyperparameters that were specified by Lei et al. (2023): SGD optimizer with learning rate = 0.03, momentum = 0.9, weight decay = 0.0001, and cosine rate decay schedule i.e. η = η0 cos 7πk 16K , where η0 denotes the initial learning rate, k is the index of training step, and K is the total amount of training steps. The LICO-specific parameters are also used unchanged: α = 10, β = 1, and the hidden dimension of the text projection MLP is 512. We use 100 epochs for all tested datasets and the Res Net-18 architecture unless otherwise stated. |