High-Dimensional Inference for Cluster-Based Graphical Models
Authors: Carson Eisenach, Florentina Bunea, Yang Ning, Claudiu Dinicu
JMLR 2020 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | As an illustration of the usage of these newly developed inferential tools, we show that they can be reliably used for recovery of the sparsity pattern of the graphs we study, under FDR control, which is verified via simulation studies and an f MRI data analysis. These experimental results confirm the theoretically established difference between the two graph structures. Furthermore, the data analysis suggests that the latent variable graph, corresponding to the unobserved cluster centers, can help provide more insight into the understanding of the brain connectivity networks relative to the simpler, average-based, graph. This section contains simulations and a real data analysis that illustrate the finite sample performance of the inferential procedures developed in the previous sections for the latent variable graph and cluster-average graph, respectively. |
| Researcher Affiliation | Academia | Carson Eisenach EMAIL Department of Operations Research and Financial Engineering Princeton University Princeton, NJ 08544, USA Florentina Bunea EMAIL Yang Ning EMAIL Claudiu Dinicu EMAIL Department of Statistics and Data Science Cornell University Ithaca, NY 14850, USA |
| Pseudocode | Yes | For completeness, we outline the PECOK algorithm below, which consists in a convex relaxation of the K-means algorithm, further tailored to estimation of clusters G = {G 1, . . . , G K} defined via the interpretable model (1). The PECOK algorithm consists in the following three steps: 1. Compute an estimator eΓ of the matrix Γ . 2. Solve the semi-definite program (SDP) b B = argmax B D bΣ eΓ, B , (4) where bΣ is the sample covariance matrix and B 0 (symmetric and positive semidefinite) P a Bab = 1, b Bab 0, a, b tr(B) = K 3. Compute b G by applying a clustering algorithm on the rows (or equivalently columns) of b B. |
| Open Source Code | No | The paper does not explicitly provide a link to source code developed by the authors for the methods described in this paper, nor does it contain an explicit statement of code release for this work. It mentions the FORCE algorithm and cites Eisenach and Liu (2019) for it, but this is a tool used, not the code for the current paper's methodology. |
| Open Datasets | Yes | As an illustration, we focus on the publicly available resting-state f MRI data from the Neuro-bureau pre-processed repository (Bellec et al., 2015). Specifically, we use the data from patient 1018959, session 1 in the KKI dataset. |
| Dataset Splits | No | The paper describes generating synthetic datasets for simulations but does not specify training/test/validation splits (e.g., percentages, counts, or references to standard splits). For the fMRI dataset, it mentions using data from 148 time periods but does not detail how this data was split for experimental evaluation purposes. |
| Hardware Specification | No | The paper does not specify any particular hardware (e.g., GPU/CPU models, processor types, memory amounts, or cloud/cluster specifications) used for running the experiments or simulations. |
| Software Dependencies | No | The paper mentions applying the 'FORCE algorithm (Eisenach and Liu, 2019)' but does not provide specific version numbers for this or any other software, libraries, or solvers that would be necessary to replicate the experiments. |
| Experiment Setup | Yes | The regularization parameters λ and λ are chosen by 5-fold cross validation. |