Implications of Model Indeterminacy for Explanations of Automated Decisions

Authors: Marc-Etienne Brunet, Ashton Anderson, Richard Zemel

NeurIPS 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To explore the extent to which model indeterminacy may impact the consistency of explanations in a practical setting, we conduct a series of experiments.
Researcher Affiliation Academia Marc-Etienne Brunet University of Toronto Vector Institute EMAIL Ashton Anderson University of Toronto Vector Institute EMAIL Richard Zemel University of Toronto Columbia University Vector Institute EMAIL
Pseudocode No The paper describes methods and mathematical formulations but does not contain a structured pseudocode or algorithm block, nor is there a section explicitly labeled "Pseudocode" or "Algorithm".
Open Source Code No Experimental source code will be made available at github.com/mebrunet/model-indeterminacy
Open Datasets Yes We use three different (binary) risk assessment datasets (all available on Kaggle): UCI Credit Card [35], Give Me Some Credit, and Porto Seguro s Safe Driver Prediction. Their details can be found in Appendix B.1.
Dataset Splits Yes We first split each dataset into a development and a holdout set (70 / 30), and apply one-hot encoding and standard scaling. We then run a model selection process with three model classes: logistic regression (LR), multi-layer perceptron (MLP), and a tabular Res Net (TRN) recently proposed by Gorishniy et al. [10]. We sweep through a range of hyperparameter settings, trying a total of 408 model-hyperparameter configurations per dataset. For each configuration, we pick a random seed and use it to control a shuffled split of the development dataset into train and validation sets (70 / 30).
Hardware Specification No Our experiments were conducted on a GPU accelerated computing cluster.
Software Dependencies No ML models were written in Py Torch [26], and the analysis used Num Py [12] and Matplotlib [13] 1 2.
Experiment Setup Yes We sweep through a range of hyperparameter settings, trying a total of 408 model-hyperparameter configurations per dataset. For each configuration, we pick a random seed and use it to control a shuffled split of the development dataset into train and validation sets (70 / 30). This seed also controls the randomness used in training (optimization). We fit the models using Adam [15] with a patience-based stopping criteria on the validation set. We also up-weight the rare class, creating a balanced loss. We repeat this process with 3 random seeds per configuration, obtaining a total of 1224 model instances per dataset.