reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Diagnosing AI Explanation Methods with Folk Concepts of Behavior

Authors: Alon Jacovi, Jasmijn Bastings, Sebastian Gehrmann, Yoav Goldberg, Katja Filippova

JAIR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We then demonstrate that many XAI methods today can be mapped to folk concepts of behavior in a qualitative evaluation. This allows us to uncover their failure modes that prevent current methods from explaining successfully i.e., the information constructs that are missing for any given XAI method, and whose inclusion can decrease the likelihood of misunderstanding AI behavior. This section is a case study of such diagnoses of four common types of AI explanation.
Researcher Affiliation	Collaboration	Alon Jacovi EMAIL Bar Ilan University Jasmijn Bastings EMAIL Google Deep Mind Sebastian Gehrmann EMAIL Google Research Yoav Goldberg EMAIL Bar Ilan University, Allen Institute for Artificial Intelligence Katja Filippova EMAIL Google Deep Mind
Pseudocode	No	The paper describes a conceptual framework for evaluating AI explanations and applies it to various XAI methods through qualitative analysis and illustrative examples. It does not contain any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain any explicit statements about releasing source code for the methodology described, nor does it provide links to any code repositories in the main text or appendices.
Open Datasets	No	The paper presents a conceptual framework and applies it to existing XAI methods using illustrative examples (e.g., self-driving car, restaurant review, carnivorism prediction). It does not introduce, use, or provide access information for any specific datasets for its own analysis or experiments.
Dataset Splits	No	The paper does not conduct experiments that require dataset splits. It focuses on conceptual analysis and qualitative evaluation of existing XAI methods using illustrative examples, rather than empirical evaluation on specific datasets with train/test/validation splits.
Hardware Specification	No	The paper describes a theoretical framework and performs a qualitative evaluation of existing XAI methods. It does not report on experiments requiring specific hardware, and therefore, no hardware specifications are mentioned.
Software Dependencies	No	The paper does not detail any specific software dependencies or versions for its own methodology, as its contribution is primarily theoretical and analytical, rather than implementation-focused. It discusses existing XAI methods but does not specify software used to implement its own framework.
Experiment Setup	No	The paper presents a conceptual framework and performs a qualitative analysis of existing XAI methods using illustrative examples. It does not include an experimental setup with specific hyperparameters, training configurations, or other system-level settings for new experiments.