reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Walking the Web of Concept-Class Relationships in Incrementally Trained Interpretable Models

Authors: Susmit Agrawal, Deepika Vemuri, Sri Siddarth Chakaravarthy P, Vineeth N. Balasubramanian

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through extensive experimentation, we show that our approach obtains state-of-the-art classification performance compared to other concept-based models, achieving over 2 the classification performance in some cases. We also study the ability of our model to perform interventions on concepts, and show that it can localize visual concepts in input images, providing post-hoc interpretations.
Researcher Affiliation	Academia	Indian Institute of Technology Hyderabad EMAIL, EMAIL
Pseudocode	No	The paper describes the methodology using textual descriptions and mathematical formulas, but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code	Yes	Code https://github.com/Susmit-A/Mu CIL Appendix https://susmit-a.github.io/misc/appendix.pdf
Open Datasets	Yes	We perform a comprehensive suite of experiments to study the performance of Mu CIL on well-known benchmarks: CIFAR-100, Image Net-100 (INet-100), and Cal Tech-UCSD Birds 200 (CUB200).
Dataset Splits	No	The paper mentions evaluating performance on a 'validation split' and refers to 'dataset details' in the appendix, but it does not provide specific percentages, sample counts, or explicit splitting methodology for the training, validation, and test sets within the main text.
Hardware Specification	No	The paper does not provide any specific details regarding the hardware (e.g., GPU models, CPU types, memory) used to conduct the experiments.
Software Dependencies	No	The paper discusses the use of a transformer architecture and models like GPT 3.5, but it does not explicitly list specific software dependencies (e.g., programming languages, libraries, frameworks) along with their version numbers required to reproduce the experiments.
Experiment Setup	Yes	In the CL setting, we study our performance over 5 and 10 experiences using concept-based methods in conjunction with three well-known CL algorithms: Experience Replay (ER) (Rebuffi et al. 2017), AGEM (Chaudhry et al. 2019), and DER++ (Buzzega et al. 2020), with a replay buffer size of 500 (we study other variations of buffer size in the Appendix). ... We empirically found λ1 = 5 and λ2 = 10 to give the best performance overall in terms of FAA, LA and grounding similarity, with LA = 0.7722 and cosine similarity 0.998.