A Distributed Method for Fitting Laplacian Regularized Stratified Models
Authors: Jonathan Tuck, Shane Barratt, Stephen Boyd
JMLR 2021 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We illustrate the ideas and method with several examples. In this section we illustrate the effectiveness of stratified models by combining base fitting methods and regularization graphs to create stratified models. In each example, we fit three models: a stratified model without Laplacian regularization (which we refer to as a separate model), a common model without stratification, and a stratified model with nontrivial edge weights. For each model, we performed a validation technique and selected the hyper-parameters that performed best over this technique. Even with these highly simplified Laplacian regularized stratified mdoels, we find that the stratified model significantly outperforms the other two methods in each example. The code is available online at https://github.com/cvxgrp/strat_models. All numerical experiments were performed on an unloaded Intel i7-8700K CPU. |
| Researcher Affiliation | Academia | Jonathan Tuck EMAIL, Shane Barratt EMAIL, Stephen Boyd EMAIL Department of Electrical Engineering Stanford University Stanford, CA 94305, USA |
| Pseudocode | Yes | Algorithm 4.1 Distributed method for fitting stratified models with Laplacian regularization. |
| Open Source Code | Yes | We provide an (easily extensible) implementation of the ideas described in the paper, available at www.github.com/cvxgrp/strat_models. |
| Open Datasets | Yes | Mesothelioma classification: We obtained data describing 324 patients from the Dicle University Faculty of Medicine (Er et al., 2012; Dua and Graff, 2019). Senate elections: We obtained data describing the outcome of every United States Senate election from 1976 to 2016 (every two years, 21 time periods) for all 50 states (Data and Lab, 2017). Chicago crime prediction: We downloaded a dataset of crime records from the greater Chicago area, collected by the Chicago police department, which include the time, location, and type of crime (Department, 2019). |
| Dataset Splits | Yes | Mesothelioma classification: We randomly split the data into five folds. House price prediction: We randomly split the dataset into five folds. Senate elections: We created a training dataset consisting of the outcomes of every Senate election from 1976 to 2012, and a test dataset using 2014 and 2016. Chicago crime prediction: From the dataset, we created a training set, composed of the recorded crimes in 2017, and a test set, composed of the recorded crimes in 2018. |
| Hardware Specification | Yes | All numerical experiments were performed on an unloaded Intel i7-8700K CPU. |
| Software Dependencies | No | The paper lists several software components (numpy, scipy, networkx, torch, multiprocessing) but does not provide specific version numbers for these dependencies. |
| Experiment Setup | Yes | Mesothelioma classification: We used γlocal = 12.9 for the separate model and γlocal = 0.004 for the common model; we used γlocal = .52, γsex = 10, and γage = 500 for the stratified model, which were obtained by selecting the hyper-parameter combination that minimized the 5-fold crossvalidated average negative log likelihood (ANLL). House price prediction: We used γlocal = 0.001 for the separate model, γlocal = 0.001 for the common model, and γlocal = 0.001 and γgeo = 18.5 for the stratified model. Senate elections: We ran the fitting method with γstate = 1 and γyear = 4. Chicago crime prediction: We ran the fitting method with γloc = γweek = γday = γhour = 100. |