reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Efficient and Rigorous Model-Agnostic Explanations

Authors: Joao Marques-Silva, Jairo A. Lefebre-Lobaina, Maria Vanina Martinez

IJCAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The experiments confirm the scalability of the novel algorithms. Section 5: Experiments. This section presents results on the algorithms proposed in the paper. We opt to explain existing large-size datasets. The experiments consider 30 real-world datasets. The number of samples range from 1000 to over one million, whereas the number of features range from 10 to 250.
Researcher Affiliation	Academia	Joao Marques-Silva1 , Jairo A. Lefebre-Lobaina2 , Maria Vanina Martinez2. 1ICREA & Univ. Lleida, Spain 2Artificial Intelligence Research Institute (IIIA), CSIC, Spain EMAIL, EMAIL. All listed institutions (ICREA, University of Lleida, Artificial Intelligence Research Institute (IIIA), CSIC) are academic/public research institutions.
Pseudocode	Yes	Algorithm 1 Finding one sb AXp. Input: E: Xp problem; WXp: sb WCXps; TCs: Counts Output: AXp X F...
Open Source Code	No	The paper does not provide an explicit statement about releasing its own source code, nor does it provide a link to a code repository for the methodology described. While 'Orange 3.38.0' is mentioned, this refers to a third-party tool used for an example.
Open Datasets	No	The experiments consider 30 real-world datasets. All datasets are concerned with solving supervised classification problems, that include different types of features (boolean, categorical, integer and real values). The paper mentions using "30 real-world datasets" but does not provide specific names, links, DOIs, or formal citations for these datasets to enable public access. The example datasets 'Da' and 'Db' are for illustration within the paper and not presented as publicly available for external use.
Dataset Splits	No	The experiments consider 30 real-world datasets... for each dataset, 10 different instances were randomly selected; in the experiments, the mean values are reported. The paper describes how instances were selected for evaluation but does not provide specific training/test/validation dataset splits (percentages, counts, or references to predefined splits) for the 30 real-world datasets used.
Hardware Specification	Yes	All experiments were conducted on a computer with an AMD Ryzen 7 4800HS and 16 GB of RAM.
Software Dependencies	Yes	Obtained with Orange 3.38.0, https://orangedatamining.com/.
Experiment Setup	Yes	The experiments consider 30 real-world datasets... for each dataset, 10 different instances were randomly selected; in the experiments, the mean values are reported. We measured the time required to compute: (i) one sb CXp; (ii) all sb CXps; (iii) one sb AXp; (iv) one smallest sb AXp; (v) several sb AXps; and (vi) feature relevancy & necessity (for sb AXps/sb CXps). The smallest sb AXp trivial cases (columns with only value 1 in the sb CXp set) were removed from the sb CXp set beforehand, as an additional optimization. The choice of 10K sb AXps aims solely at illustrating the scalability of the proposed algorithm.