Convex Regression with Interpretable Sharp Partitions

Authors: Ashley Petersen, Noah Simon, Daniela Witten

JMLR 2016 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We explore the properties of CRISP, and evaluate its performance in a simulation study and on a housing price data set.
Researcher Affiliation Academia Department of Biostatistics University of Washington Seattle, WA 98195
Pseudocode Yes Algorithm 1 Alternating Directions Method of Multipliers for Equation (4) ... Algorithm 2 Block Coordinate Descent for CRISP with p > 2 (Equation (13))
Open Source Code No The paper mentions 'Our Python implementation of CRISP' and 'FLAM (implemented with the R package flam (Petersen, 2014)); CART (implemented with the R package rpart (Therneau et al., 2014)); TPS (implemented with the R package fields (Nychka et al., 2014))', but it does not provide an explicit link or statement about the open-sourcing of the code for the CRISP methodology itself. The R packages mentioned are for third-party or related methods, not the direct implementation of CRISP.
Open Datasets Yes The data set was originally considered in Pace and Barry (1997) and is publicly available from the Carnegie Mellon Stat Lib data repository (lib.stat.cmu.edu).
Dataset Splits Yes We consider five different training set sizes: 100, 500, 1000, 5000, and 11,198 (which corresponds to 60% of the observations). We use the observations not selected for the training set as the test set.
Hardware Specification Yes On a Macbook Pro with a 2.0 GHz Intel Sandy Bridge Core i7 processor, our Python implementation of CRISP with n = q = 50 takes 20.1 seconds for a sequence of 20 λ values.
Software Dependencies Yes FLAM (implemented with the R package flam (Petersen, 2014)); CART (implemented with the R package rpart (Therneau et al., 2014)); TPS (implemented with the R package fields (Nychka et al., 2014))... R package version 1.0 for flam, R package version 4.1-8 for rpart, R package version 7.1 for fields.
Experiment Setup Yes We generate data with either n = 100 or n = 10, 000, and p = 2. We independently sample each element of x1 and x2 from a Unif[−2.5, 2.5] distribution, and then take y = f(x1, x2) + ϵ, where ϵ ∼ MVN(0, σ2In) with σ = 1 for n = 100 and σ = 10 for n = 10, 000. ... For each scenario, we generate 200 data sets and estimate M using CRISP (with q = 100) and several competitors.