Tree-based Node Aggregation in Sparse Graphical Models

Authors: Ines Wilms, Jacob Bien

JMLR 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Section 4 presents the results of a simulation study. Section 5 illustrates the practical advantages of the tag-lasso on financial and microbiome data sets. We investigate the advantages of jointly exploiting node aggregation and edge sparsity in graphical models. We evaluate the estimators in terms of three performance metrics: estimation accuracy, aggregation performance, and sparsity recovery.
Researcher Affiliation Academia Ines Wilms EMAIL Department of Quantitative Economics Maastricht University Maastricht, The Netherlands Jacob Bien EMAIL Department of Data Sciences and Operations Marshall School of Business, University of Southern California California, USA
Pseudocode Yes Algorithm 1 Compute partition matrix from tag-lasso solution Algorithm 2 LA-ADMM Algorithm 3 ADMM
Open Source Code Yes An R package called taglasso implements the proposed method and is available on the Git Hub page (https://github.com/ineswilms/taglasso) of the first author.
Open Datasets Yes We demonstrate our method on a financial data set containing daily realized variances of p = 31 stock market indices from across the world in 2019 (n = 254). Daily realized variances based on five minute returns are taken from the Oxford-Man Institute of Quantitative Finance (publicly available at http://realized.oxford-man.ox.ac.uk/data/download). We next turn to a data set of gut microbial amplicon data in HIV patients (Rivera-Pinto et al., 2018)
Dataset Splits Yes To select the tuning parameters λ1 and λ2, we form a 10 10 grid of (λ1, λ2) values and find the pair that minimizes a 5-fold cross-validated likelihood-based score We take a random sample of n = 203 observations (80% of the full data set) to form a training sample covariance matrix and use the remaining data to form a test sample covariance matrix Stest, and repeat this procedure ten times.
Hardware Specification No The paper does not provide specific hardware details. It only states that simulations were performed using an R package.
Software Dependencies No All simulations were performed using the simulator package (Bien, 2016) in R (R Core Team, 2017). While R and a package are mentioned, no specific version numbers for these software components are provided in the text.
Experiment Setup Yes To select the tuning parameters λ1 and λ2, we form a 10 10 grid of (λ1, λ2) values and find the pair that minimizes a 5-fold cross-validated likelihood-based score We use the LA-ADMM algorithm with ρ1 = 0.01, Tstages = 10, maxit = 100.