Bayesian Model Selection with Graph Structured Sparsity

Authors: Youngseok Kim, Chao Gao

JMLR 2020 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive simulation studies and real data applications are conducted to demonstrate the superior performance of our methods over its frequentist competitors such as ℓ0 or ℓ1 penalization. Extensive simulation studies and real data analysis will be presented in Section 7.
Researcher Affiliation Academia Youngseok Kim EMAIL Chao Gao EMAIL Department of Statistics University of Chicago Chicago, IL 60637, USA
Pseudocode Yes Algorithm 1 A fast DLPA Input: Initialize u1, u2 and z2. znew 1 = (In1 + Lq1) 1(z2 + u2) znew 2 = (znew 1 + u1)(In2 + Lq2) 1 unew 1 = z2 + u2 znew 1 , unew 2 = znew 1 + u2 znew 2 z1 = znew 1 , z2 = znew 2 , u1 = unew 1 , u2 = unew 2 until convergence criteria met Output: θ = z2
Open Source Code No The paper does not contain any explicit statement about releasing source code for the methodology described, nor does it provide a link to a code repository. It mentions using 'R and Julia programming languages' and refers to several existing R packages, but not for the authors' own implementation.
Open Datasets Yes The global warming data has been studied previously by Wu et al. (2001); Tibshirani et al. (2011). The same data set has also been used by Bhattacharjee et al. (2001); Lee et al. (2010); Sill et al. (2011); Chi et al. (2017). The Chicago crime data is publicly available at Chicago Police Department website6. The Chicago crime data set can be retrieved from https://data.cityofchicago.org/Public-Safety/Crimes-2001-to-present.
Dataset Splits No The paper describes generating data for simulation studies (e.g., 'We generate data according to y N(β , σ2In) with n = 1000 and σ {0.1, 0.2, 0.3, 0.4, 0.5}') and mentions 'test data sets' (e.g., 'Tables 3 displays the estimation error (MSEs) θ bθ 2 on the test data sets'). However, it does not provide explicit details about how these datasets were split into training, validation, or test sets (e.g., specific percentages, sample counts, or predefined split citations). For real data applications, subsets are used (e.g., 'a subset with 56 samples and 100 genes') without split details.
Hardware Specification Yes All simulation studies and real data applications were conduced on a standard laptop (2.6 GHz Intel Core i7 processor and 16GB memory) using R and Julia programming languages.
Software Dependencies No The paper mentions using 'R and Julia programming languages' but does not specify their versions. It also refers to several R packages like 'glmnet R package', 'BGLR R package', 'genlasso R package (Arnold and Tibshirani, 2014)', 'ITALE R package', 'lqa R package' for OSCAR, and 'blockcluster, sparse BC and cvxbiclustr', but specific version numbers for these packages are not provided. The year mentioned for some packages (e.g., 2014) refers to the publication date of a related paper, not the software version.
Experiment Setup Yes The tuning parameter λ in (68) is selected by cross validation using the default method of the R package genlasso (Arnold and Tibshirani, 2014). For ℓ0-pen, the λ in (69) is selected using the method suggested by Fan and Guan (2018). The hyper-parameters a, b, A, B in (5) and (6) are all set as the default value 1. For the biclustering models, we use n1 = 56, n2 = 100, k1 = 10, and k2 = 20. To pursue a more flexible procedure of model selection, we use two independent pairs of (v0, v1) for the row structure and the column structure. To be specific, let (v0, v1) be the parameters for the row structure, and the parameters for the column structure are set as (cv0, cv1) with some c {1/10, 1/5, 1/2, 1, 2, 5, 10}.