reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Contextual Explanation Networks

Authors: Maruan Al-Shedivat, Avinava Dubey, Eric Xing

JMLR 2020 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our results on image and text classiﬁcation and survival analysis tasks demonstrate that CENs are not only competitive with the state-of-the-art methods but also oﬀer additional insights behind each prediction, that can be valuable for decision support. We analyze the proposed framework theoretically and experimentally. Our results on image and text classiﬁcation and survival analysis tasks demonstrate that CENs are not only competitive with the state-of-the-art methods but also oﬀer additional insights behind each prediction, that can be valuable for decision support.
Researcher Affiliation	Collaboration	Maruan Al-Shedivat EMAIL Carnegie Mellon University Avinava Dubey EMAIL Google Research Eric Xing EMAIL Carnegie Mellon University & Petuum Inc.
Pseudocode	No	The paper describes algorithms and models using mathematical formulations and descriptive text, but it does not contain any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Our code is available at https://github.com/alshedivat/cen.
Open Datasets	Yes	We use two publicly available datasets for survival analysis of of the intense care unit (ICU) patients: (a) SUPPORT2,10 and (b) data from the Physio Net 2012 challenge.11 10. http://biostat.mc.vanderbilt.edu/wiki/Main/Data Sets. 11. https://physionet.org/challenge/2012/.
Dataset Splits	Yes	This dataset has 25k labelled reviews used for training and validation, 25k labelled reviews that are held out for test, and 50k unlabelled reviews. (IMDB dataset, Section 6.1.2) MNIST. We used the classical split of the dataset into 50k training, 10k validation, and 10k testing points. (Appendix B.1) SUPPORT2: The data had 9105 patient records (7105 training, 1000 validation, 1000 test)... (Section 6.3.3)
Hardware Specification	No	The paper does not explicitly describe the specific hardware (e.g., GPU models, CPU models, memory) used for running its experiments. It mentions using a VGG-F network and Keras with TensorFlow backend, which are software/model related, not hardware.
Software Dependencies	Yes	reimplemented in Keras (Chollet et al., 2015) with Tensor Flow (Abadi et al., 2016) backend. (Appendix B.1) All models were trained for 100 epochs using the AMSGrad optimizer (Reddi et al., 2019) with the learning rate of 10 3. (Appendix B.1)
Experiment Setup	Yes	All models were trained for 100 epochs using the AMSGrad optimizer (Reddi et al., 2019) with the learning rate of 10 3. No data augmentation was used in any of our experiments. (Appendix B.1) Context encoder of the CEN model uses VGG-F to process images, followed by an attention layer over a dictionary of 16 trainable linear explanations defined over the categorical features (Figure 3). (Section 6.1.1) Appendix B.3 provides tables summarizing top-performing architectures, including details on convolutional blocks, dense layers, dropout rates, dictionary sizes, and regularization penalties (L1, L2).