reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Interpretable Local Concept-based Explanation with Human Feedback to Predict All-cause Mortality

Authors: Radwa EL Shawi, Mouaz H. Al-Mallah

JAIR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we introduce the dataset used in this work and the concepts definition in Sections 4.1, and 4.2, respectively. We define baselines in Section 4.3 to be compared to the proposed approach in Section 4.4. The faithfulness of the proposed approach is evaluated in Section 4.5. We evaluate the trust in the explanations of CLEF in Section 4.6. We show in Section 4.7 the effectiveness of the explanations of CLEF in detecting bias in data.
Researcher Affiliation	Collaboration	Radwa Elshawi EMAIL Institute of Computer Science Tartu University, Estonia Mouaz H Al-Mallah EMAIL Houston Methodist De Bakey Heart & Vascular Center Houston, TX, USA
Pseudocode	Yes	Algorithm 1: Algorithm for interactively proposing intuitive and interpretable concepts with human feedback
Open Source Code	No	The paper mentions using "scikit-learn implementations (Pedregosa, Varoquaux, Gramfort, Michel, Thirion, Grisel, Blondel, Prettenhofer, Weiss, Dubourg, et al., 2011)" for training. However, it does not explicitly state that the authors' own implementation of CLEF is open-source, nor does it provide any links to a code repository.
Open Datasets	Yes	4.1 Henry Ford FIT Dataset The dataset of this study was collected from patients who underwent treadmill stress testing by physician referrals at Henry Ford Affiliated Hospitals in metropolitan Detroit, MI in the United States, FIT Project (Al-Mallah et al., 2014).
Dataset Splits	Yes	The dataset used in this work is split 60% for training, 20% for validation and 20% for testing.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU models, CPU types, or memory amounts used for running the experiments. It only mentions training models like random forest and support vector machine.
Software Dependencies	No	The paper states: "We train all approaches using the scikit-learn implementations (Pedregosa, Varoquaux, Gramfort, Michel, Thirion, Grisel, Blondel, Prettenhofer, Weiss, Dubourg, et al., 2011)." While scikit-learn is mentioned, a specific version number is not provided, only a citation to its foundational paper.
Experiment Setup	Yes	In this work, we use a fixed depth of 4, leaving the exploration of dynamic depth to future work. More specifically, each concept cj is associated with two list of features; the explored list lj consists of features that have been proposed to a clinician to be associated with concept cj and the other list uj consists of the set of features that have not been proposed yet for concept cj. If the clinician accepts the proposed feature-concept association, then the proposed feature is added to the concept definition and thus feature-concept matrix Ai,j = 1; otherwise, the feature-concept matrix remains unchanged. List lj is first initialized with a single feature i, such that Ai,j = 1 for each concept j and uj is initialized with the rest of features that are not included in lj. Algorithm 1 models the human feedback while proposing feature-concept associations by incorporating the clinician s prior acceptance of feature-concept associations to improve future proposals made by the algorithm and refit model f each time g is updated. To do so, we store a set of labels of the proposals that the user has previously accepted or rejected in matrix intuit. This matrix is first initialized so that intuiti,j = 1 and intuiti,j =j = 0 if Ai,j = 1 in the concept definitions initialized by the user. The matrix is then updated such that intuiti ,j = 1 if the user accepts the proposed feature-concept association; otherwise, it remains unchanged. We assume that a single feature can be associated with different concepts. The key challenge is to propose feature-concept associations that are intuitive for the clinician and equally highly faithful to the model being explained. If the proposal is highly For each concept, we make a fixed number of proposals before moving to the next concept. In this work, we use a fixed number of proposals per concept numproposals = 7.