reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Restyling Unsupervised Concept Based Interpretable Networks with Generative Models

Authors: Jayneel Parekh, Quentin Bouniot, Pavlo Mozharovskyi, Alasdair Newson, Florence d'Alché-Buc

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We quantitatively ascertain the efficacy of our method in terms of accuracy of the interpretable prediction network, fidelity of reconstruction, as well as faithfulness and consistency of learnt concepts. The experiments are conducted on multiple image recognition benchmarks for large-scale images. Project page available at https://jayneelparekh.github.io/Vis Co IN_project_page/ (Abstract) and "4 EXPERIMENTS AND RESULTS"
Researcher Affiliation	Academia	1ISIR, Sorbonne Universit e, 2LTCI, T el ecom Paris, Institut Polytechnique de Paris, France, 3Technical University of Munich, 4Helmholtz Munich, 5Munich Center for Machine Learning (MCML)
Pseudocode	No	The paper describes methods using mathematical equations and textual explanations, but does not contain explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Project page available at https://jayneelparekh.github.io/Vis Co IN_project_page/
Open Datasets	Yes	Datasets We experiment on image recognition tasks for large-scale images in three different domains with a greater focus on multi-class classification tasks: (1) Binary age classification (young/old) on Celeb A-HQ (Karras et al., 2018), (2) fine-grained bird classification for 200 classes on Caltech-UCSD-Birds-200 (CUB-200) (Wah et al., 2011), and (3) fine-grained car model classification of 196 classes on Stanford Cars (Krause et al., 2013).
Dataset Splits	No	The paper refers to 'training data' and 'test data' throughout the experimental sections (e.g., 'median of FFx over the test data', 'Stest k', 'The GANs are trained only on the training data'), but does not explicitly provide specific percentages, sample counts, or methodology for splitting the datasets into training, validation, and test sets.
Hardware Specification	Yes	All experiments were conducted on a single V100-32GB GPU. (Section 4) All of these experiments have been conducted on a single A100 GPU, with a batch size of 64 for Celeb A-HQ and 128 for Stanford-Cars dataset and on V100 GPU with a batch size of 32 for CUB-200. (Appendix C.2.1)
Software Dependencies	No	The paper mentions using 'Adam optimizer' and the 'official Style GAN2-ADA Pytorch repository' but does not specify version numbers for PyTorch, Python, or any other software libraries used.
Experiment Setup	Yes	We train for 50K iterations on Celeb A-HQ and 100K iterations on CUB-200 and Stanford-Cars. We use Adam optimizer with learning rate 0.0001 for all subnetworks and on all datasets. During training, each batch consists of 8 samples from the training data and 8 synthetic samples randomly generated using G. The training data samples use a random cropping and random horizontal flip augmentation in all cases. All images are normalized to the range [ 1, 1] and have resolution 256 256 for processing. Table 3: Hyperparameters values for Vis Co IN (K, α, β, γ, δ for each dataset).