Learning Certifiably Optimal Rule Lists for Categorical Data

Authors: Elaine Angelino, Nicholas Larus-Stone, Daniel Alabi, Margo Seltzer, Cynthia Rudin

JMLR 2017 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present the design and implementation of a custom discrete optimization technique for building rule lists over a categorical feature space. Our algorithm produces rule lists with optimal training performance, according to the regularized empirical risk, with a certificate of optimality. By leveraging algorithmic bounds, efficient data structures, and computational reuse, we achieve several orders of magnitude speedup in time and a massive reduction of memory consumption. We demonstrate that our approach produces optimal rule lists on practical problems in seconds. Our results indicate that it is possible to construct optimal sparse rule lists that are approximately as accurate as the COMPAS proprietary risk prediction tool on data from Broward County, Florida, but that are completely interpretable. We evaluated CORELS on a number of publicly available data sets. Our metric of success was 10-fold cross-validated prediction accuracy on a subset of the data.
Researcher Affiliation Academia Elaine Angelino EMAIL Department of Electrical Engineering and Computer Sciences University of California, Berkeley, Berkeley, CA 94720 Nicholas Larus-Stone EMAIL Daniel Alabi EMAIL Margo Seltzer EMAIL School of Engineering and Applied Sciences Harvard University, Cambridge, MA 02138 Cynthia Rudin EMAIL Department of Computer Science and Department of Electrical and Computer Engineering Duke University, Durham, NC 27708
Pseudocode Yes Algorithm 1 Branch-and-bound for learning rule lists. Algorithm 2 Incremental branch-and-bound for learning rule lists, for simplicity, from a cold start. Algorithm 3 Incremental objective lower bound (30) used in Algorithm 2. Algorithm 4 Incremental objective function (32) used in Algorithm 2. Algorithm 5 The inner loop of CORELS, which evaluates all children of a prefix dp. Algorithm 6 Possibly insert a prefix into CORELS data structures, after first checking the symmetry-aware map, which supports search space pruning triggered by the permutation bound (Corollary 16).
Open Source Code Yes Our code is at https://github.com/nlarusstone/corels, where we provide the C++ implementation we used in our experiments ( 6).
Open Datasets Yes We illustrate the efficacy of our approach using (1) the Pro Publica COMPAS data set (Larson et al., 2016), for the problem of two-year recidivism prediction, and (2) stop-and-frisk data sets from the NYPD (New York Police Department, 2016) and the NYCLU (New York Civil Liberties Union, 2014), to predict whether a weapon will be found on a stopped individual who is frisked or searched.
Dataset Splits Yes Our metric of success was 10-fold cross-validated prediction accuracy on a subset of the data. Resampling due to class imbalance, for 10-fold cross-validation, yields training sets that each contain 566,839 datapoints. (We form corresponding test sets without resampling.) In our 10-fold cross-validation experiments, each training set contains 50,743 observations.
Hardware Specification Yes All timed results ran on a server with an Intel Xeon E5-2699 v4 (55 MB cache, 2.20 GHz) processor and 264 GB RAM, and we ran each timing measurement separately, on a single hardware thread, with nothing else running on the server.
Software Dependencies No The paper mentions using R packages (rpart, RWeka, and caret) and a C++ implementation for CORELS, as well as referring to an SBRL C implementation. However, it does not provide specific version numbers for any of these software components, which is required for a reproducible description.
Experiment Setup Yes For CORELS, we use regularization parameter λ = 0.005. By default, SBRL sets η = 3, λ = 9, the number of chains to 11 and iterations to 1,000. By default, CART uses complexity parameter cp = 0.01 and C4.5 uses complexity parameter C = 0.25.