Interpretable and Fair Boolean Rule Sets via Column Generation
Authors: Connor Lawless, Sanjeeb Dash, Oktay Gunluk, Dennis Wei
JMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To demonstrate the performance of our approach we present a suite of numerical results. We start with describing the data sets, experimental setup, and computational environment that we used. In Section 5.2, we present an empirical study aimed at understanding the effect of using Hamming loss as a proxy for 0-1 loss. We benchmark our approach against state-of-the-art algorithms in both the traditional classification setting (Section 5.3), and the fairness setting (Section 5.4). |
| Researcher Affiliation | Collaboration | Connor Lawless1 EMAIL Sanjeeb Dash2 EMAIL Oktay G unl uk1 EMAIL Dennis Wei2 EMAIL 1 Operations Research and Information Engineering, Cornell University, Ithaca, NY, 14850 2 IBM Research, Yorktown Heights, NY, 10598 |
| Pseudocode | No | The paper describes the column generation procedure and Pricing Problem in prose and mathematical formulations, but it does not contain a clearly labeled 'Pseudocode' or 'Algorithm' section with structured, step-by-step instructions. Figure 1 is a flowchart, not pseudocode. |
| Open Source Code | No | The paper mentions publicly available code for benchmark methods like 'Fair CORELS (Aıvodji et al., 2021)' and 'BRS' but does not explicitly state that the authors are releasing the source code for their own methodology described in this paper. |
| Open Datasets | Yes | Evaluations were conducted on 15 classification data sets from the UCI repository (Dua and Karra Taniskidou, 2017) that have been used in recent works on rule set/Boolean classifiers (Malioutov and Varshney, 2013; Dash et al., 2014; Su et al., 2016; Wang et al., 2017). In addition, we used data from the FICO Explainable Machine Learning Challenge (FICO). For all three data sets we include the sensitive attribute as a feature for prediction as their exclusion is both not enough to ensure fairness and may lead to worse fairness outcomes (see Corbett-Davies and Goel, 2018, for a discussion). For the compas data from Pro Publica (Angwin et al., 2016) we use the fair machine learning cleaned data set from Kaggle. |
| Dataset Splits | Yes | Test performance on all data sets is estimated using 10-fold stratified cross-validation (CV). Unless otherwise stated, all hyperparameters were tuned using nested 10-fold cross-validation. All tables in this section are grouped by problem size in increasing order (i.e., first section corresponds to small data sets, the second to medium and large data sets). |
| Hardware Specification | Yes | All results in this section were obtained on a personal computer with a 3 GHz processor and 16 GB of RAM. All the linear and integer programs were solved using Gurobi 9.0 (Gurobi Optimization, 2020). All results in this section were obtained using a single 2.0 GHz core of a server with 64 GB of memory (only a small fraction of which was used). All results in this section were run on a personal computer with a 3 GHz processor and 16 GB of RAM. All the linear and integer programs were solved using Gurobi 9.0 (Gurobi Optimization, 2020). |
| Software Dependencies | Yes | All the linear and integer programs were solved using Gurobi 9.0 (Gurobi Optimization, 2020). All linear and integer programs were solved using CPLEX 12.7.1 (Cplex, 2009). Additional comparisons include the WEKA (Frank et al., 2016) JRip implementation of RIPPER (Cohen, 1995), a rule set learner that is still state-of-the-art in accuracy, and the scikit-learn (Pedregosa et al., 2011) implementations of the decision tree learner CART (Breiman et al., 1984) and Random Forests (RF) (Breiman, 2001). |
| Experiment Setup | Yes | For CG, we used an overall time limit of 300 seconds for training and a time limit of 45 seconds for solving the Pricing Problem in each iteration. In these experiments, CG was given an overall time limit of 120 seconds for each candidate value of C and the time limit for the Pricing Problem was set to 30 seconds. To offset the decrease in the time limit, we performed a second pass for each data set, solving the restricted MIP with all the rules generated for all possible choices of C. Mean test accuracy (over 10 partitions) and rule set complexity are reported in Tables 2 and 3. For BRS, we fixed κ = 1 as optimizing κ did not improve accuracy on the whole (as can be expected from Figure 4). Tables 2 and 3 also include results from RIPPER, CART, and RF. We tuned the minimum number of samples per leaf for CART and RF, used 100 trees for RF, and otherwise kept the default settings. |