reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

A Distributed Method for Fitting Laplacian Regularized Stratified Models

Authors: Jonathan Tuck, Shane Barratt, Stephen Boyd

JMLR 2021 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We illustrate the ideas and method with several examples. In this section we illustrate the eﬀectiveness of stratiﬁed models by combining base ﬁtting methods and regularization graphs to create stratiﬁed models. In each example, we ﬁt three models: a stratiﬁed model without Laplacian regularization (which we refer to as a separate model), a common model without stratiﬁcation, and a stratiﬁed model with nontrivial edge weights. For each model, we performed a validation technique and selected the hyper-parameters that performed best over this technique. Even with these highly simpliﬁed Laplacian regularized stratiﬁed mdoels, we ﬁnd that the stratiﬁed model signiﬁcantly outperforms the other two methods in each example. The code is available online at https://github.com/cvxgrp/strat_models. All numerical experiments were performed on an unloaded Intel i7-8700K CPU.
Researcher Affiliation	Academia	Jonathan Tuck EMAIL, Shane Barratt EMAIL, Stephen Boyd EMAIL Department of Electrical Engineering Stanford University Stanford, CA 94305, USA
Pseudocode	Yes	Algorithm 4.1 Distributed method for ﬁtting stratiﬁed models with Laplacian regularization.
Open Source Code	Yes	We provide an (easily extensible) implementation of the ideas described in the paper, available at www.github.com/cvxgrp/strat_models.
Open Datasets	Yes	Mesothelioma classiﬁcation: We obtained data describing 324 patients from the Dicle University Faculty of Medicine (Er et al., 2012; Dua and Graﬀ, 2019). Senate elections: We obtained data describing the outcome of every United States Senate election from 1976 to 2016 (every two years, 21 time periods) for all 50 states (Data and Lab, 2017). Chicago crime prediction: We downloaded a dataset of crime records from the greater Chicago area, collected by the Chicago police department, which include the time, location, and type of crime (Department, 2019).
Dataset Splits	Yes	Mesothelioma classiﬁcation: We randomly split the data into ﬁve folds. House price prediction: We randomly split the dataset into ﬁve folds. Senate elections: We created a training dataset consisting of the outcomes of every Senate election from 1976 to 2012, and a test dataset using 2014 and 2016. Chicago crime prediction: From the dataset, we created a training set, composed of the recorded crimes in 2017, and a test set, composed of the recorded crimes in 2018.
Hardware Specification	Yes	All numerical experiments were performed on an unloaded Intel i7-8700K CPU.
Software Dependencies	No	The paper lists several software components (numpy, scipy, networkx, torch, multiprocessing) but does not provide specific version numbers for these dependencies.
Experiment Setup	Yes	Mesothelioma classiﬁcation: We used γlocal = 12.9 for the separate model and γlocal = 0.004 for the common model; we used γlocal = .52, γsex = 10, and γage = 500 for the stratiﬁed model, which were obtained by selecting the hyper-parameter combination that minimized the 5-fold crossvalidated average negative log likelihood (ANLL). House price prediction: We used γlocal = 0.001 for the separate model, γlocal = 0.001 for the common model, and γlocal = 0.001 and γgeo = 18.5 for the stratiﬁed model. Senate elections: We ran the ﬁtting method with γstate = 1 and γyear = 4. Chicago crime prediction: We ran the ﬁtting method with γloc = γweek = γday = γhour = 100.