reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Position: Explainable AI Cannot Advance Without Better User Studies

Authors: Matej Pičulin, Bernarda Petek, Irena Ograjenšek, Erik Strumbelj

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We support this argument with a review of general and explainable AI-specific challenges, as well as an analysis of 607 explainable AI papers featuring user studies. We demonstrate how most user studies lack reproducibility, discussion of limitations, comparison with a baseline, or placebo explanations and are of low fidelity to real-world users and application context. This, combined with an overreliance on functional evaluation, results in a lack of understanding of the value explainable AI methods, which hinders the progress of the field. To address this issue, we call for higher methodological standards for user studies, greater appreciation of high-quality user studies in the AI community, and reduced reliance on functional evaluation. ... We analyzed user studies from 607 XAI academic papers. Throughout, we grouped the papers into papers published up to 2020, papers published 2021 and after, and papers published in top 4 conference venues (Neur IPS, ICLR, ICML, AAAI) from 2020 and after.
Researcher Affiliation	Academia	1Faculty of Computer and Information Science, University of Ljubljana, Veˇcna Pot 113, Ljubljana, Slovenia. Correspondence to: Erik ˇStrumbelj <EMAIL>.
Pseudocode	No	The paper is a position paper that includes a review and analysis of existing literature, and does not propose a new algorithm or method that would require pseudocode.
Open Source Code	No	The paper does not provide source code for the methodology it describes. It provides a link to a dataset used for its analysis: "The list of all 607 papers with meta-data is available for download1. 1https://github.com/estrumbelj/XAI-user-studies-dataset/blob/main/dataset.csv"
Open Datasets	Yes	The list of all 607 papers with meta-data is available for download1. 1https://github.com/estrumbelj/XAI-user-studies-dataset/blob/main/dataset.csv
Dataset Splits	Yes	To make the workload manageable, some of the data that require manual review are included only for a subsample of 116 papers. We sampled (with replacement) 30 papers published up to 2020, 30 papers published 2021 or later, and all 56 papers from the four conferences and published 2020 or later. We used a simple Binomial-Beta Bayesian model with Beta( 1/2) to infer the proportions and we report 95% posterior CI based on 2.5% and 97.5% quantiles.
Hardware Specification	No	The paper does not explicitly describe the hardware used to conduct its analysis. As a literature review and analysis, specific hardware details are not typically provided.
Software Dependencies	No	The paper mentions using a statistical model: "We used a simple Binomial-Beta Bayesian model with Beta( 1/2) to infer the proportions and we report 95% posterior CI based on 2.5% and 97.5% quantiles." However, it does not specify any software names with version numbers used for this analysis.
Experiment Setup	Yes	Here we provide details on how we collected and analyzed the XAI papers that were the basis for the empirical analysis in Section 4. The list of all 607 papers with meta-data is available for download1. ... We performed three different searches, with Scopus as the starting point in all three. ... To make the workload manageable, some of the data that require manual review are included only for a subsample of 116 papers. We sampled (with replacement) 30 papers published up to 2020, 30 papers published 2021 or later, and all 56 papers from the four conferences and published 2020 or later. We used a simple Binomial-Beta Bayesian model with Beta( 1/2) to infer the proportions and we report 95% posterior CI based on 2.5% and 97.5% quantiles. ... The additional data on the XAI papers with a user study are summarized in Table 1.