Learning Graphical Models With Hubs
Authors: Kean Ming Tan, Palma London, Karthik Mohan, Su-In Lee, Maryam Fazel, Daniela Witten
JMLR 2014 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | On synthetic data, we demonstrate that our proposed framework outperforms competitors that do not explicitly model hub nodes. We illustrate our proposal on a webpage data set and a gene expression data set. ... In this section, we compare HGL to two sets of proposals: proposals that learn an Erd os R enyi Gaussian graphical model, and proposals that learn a Gaussian graphical model in which some nodes are highly-connected. ... In this section, we present the results for the simulation study described in Section 4.2 with n = 100, p = 200, and |H| = 4. We calculate the proportion of correctly estimated hub nodes with r = 40. The results are shown in Figure 10. |
| Researcher Affiliation | Academia | Kean Ming Tan EMAIL Department of Biostatistics University of Washington Seattle WA, 98195 Palma London EMAIL Karthik Mohan EMAIL Department of Electrical Engineering University of Washington Seattle WA, 98195 Su-In Lee EMAIL Department of Computer Science and Engineering, Genome Sciences University of Washington Seattle WA, 98195 Maryam Fazel EMAIL Department of Electrical Engineering University of Washington Seattle WA, 98195 Daniela Witten EMAIL Department of Biostatistics University of Washington Seattle, WA 98195 |
| Pseudocode | Yes | Algorithm 1 ADMM Algorithm for Solving (3). 1. Initialize the parameters: (a) primal variables Θ, V, Z, Θ, V, and Z to the p p identity matrix. (b) dual variables W1, W2, and W3 to the p p zero matrix. (c) constants ρ > 0 and τ > 0. 2. Iterate until the stopping criterion Θt Θt 1 2 F Θt 1 2 F τ is met, where Θt is the value of Θ obtained at the tth iteration: (a) Update Θ, V, Z: i. Θ = arg min Θ S n ℓ(X, Θ) + ρ 2 Θ Θ + W1 2 F o . ii. Z = S( Z W3, λ1 ρ ), diag(Z) = diag( Z W3). Here S denotes the soft-thresholding operator, applied element-wise to a matrix: S(Aij, b) = sign(Aij) max(|Aij| b, 0). iii. C = V W2 diag( V W2). iv. Vj = max 1 λ3 ρ S(Cj,λ2/ρ) 2 , 0 S(Cj, λ2/ρ) for j = 1, . . . , p. v. diag(V) = diag( V W2). (b) Update Θ, V, Z: 6 (Θ + W1) (V + W2) (V + W2)T (Z + W3) . ii. Θ = Θ + W1 1 ρΓ; iii. V = 1 ρ(Γ + ΓT ) + V + W2; iv. Z = 1 ρΓ + Z + W3. (c) Update W1, W2, W3: i. W1 = W1 + Θ Θ; ii. W2 = W2 + V V; iii. W3 = W3 + Z Z. |
| Open Source Code | Yes | An R package hglasso is publicly available on the authors websites and on CRAN. |
| Open Datasets | Yes | We illustrate our proposal on a webpage data set and a gene expression data set. ... We applied HGL to the university webpage data set from the World Wide Knowledge Base project at Carnegie Mellon University. This data set was pre-processed by Cardoso-Cachopo (2009). ... We applied HGL to a publicly available cancer gene expression data set (Verhaak et al., 2010). |
| Dataset Splits | No | The paper describes the generation of synthetic data and the characteristics of real-world datasets used (e.g., number of variables p, number of observations n) but does not provide details on specific training/test/validation splits for these datasets. For synthetic data, it refers to averaging results over '100 simulated data sets', which relates to repetitions rather than train/test splits within a single experiment. |
| Hardware Specification | Yes | On a 1.86 GHz Intel Core 2 Duo machine, the interior point method takes 3 minutes, while ADMM takes only 1 second, on a data set with p = 30. ... We ran experiments with p = 100, 200, 300 and with n = p/2 on a 2.26GHz Intel Core 2 Duo machine. |
| Software Dependencies | No | The graphical lasso (5), implemented using the R package glasso. ... The neighborhood selection approach of Meinshausen and B uhlmann (2006), implemented using the R package glasso. ... Sparse partial correlation estimation procedure of Peng et al. (2009), implemented using the R package space. ... We compare the performance of HBN to the proposal of H ofling and Tibshirani (2009), implemented using the R package BMN. ... The paper mentions specific R packages used for implementing various methods (glasso, spcov, space, BMN) but does not provide specific version numbers for these packages or the R environment itself. |
| Experiment Setup | Yes | To obtain the curves shown in Figure 3, we fixed λ1 = 0.4, considered three values of λ3 (each shown in a different color in Figure 3), and used a fine grid of values of λ2. ... We fixed λ1 = 0.2, considered three values of λ3 (each shown in a different color), and varied λ2 in order to obtain the curves shown in Figure 6. ... For HBN, we fixed λ1 = 5, considered λ3 = {15, 25, 30}, and used a fine grid of values of λ2. ... we fix the tuning parameter that controls the sparsity of Z at λ1 = 0.45 ... we fix λ3 = 1.5 ... select a value of λ2 ranging from 0.1 to 0.5 ... We performed HGL with the selected tuning parameters λ1 = 0.45, λ2 = 0.25, and λ3 = 1.5. ... Since we are interested in identifying hub genes, and not as interested in identifying edges between non-hub nodes, we fix λ1 = 0.6 ... We fix λ3 = 6.5 ... select λ2 ranging from 0.1 to 0.7 ... We applied HGL with this set of tuning parameters ... λ1 = 0.6, λ2 = 0.4, λ3 = 6.5. |