Variance estimation in graphs with the fused lasso
Authors: Oscar Hernan Madrid Padilla
JMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we fill the gaps described above regarding mean and variance estimation in general graphs. Our main contributions are listed next. [...] Section 4 contains numerical evaluations of the proposed methods in both simulated and real data. |
| Researcher Affiliation | Academia | Oscar Hernan Madrid Padilla EMAIL Department of Statistics and Data Science Univeristy of California, Los Angeles Los Angeles, CA 90095. |
| Pseudocode | Yes | The DFS algorithm proceeds as follows: Procedure DFS(G, v): Step 1: Label v as discovered. Step 2: For all w such that (w, v) E do If vertex w is not label then recursively call DFS(G, w). |
| Open Source Code | No | The paper does not contain any explicit statement about providing open-source code for the methodology described, nor does it provide a link to a code repository. |
| Open Datasets | Yes | Specifically, we consider the Ion channels data used by Jula Vanegas et al. (2021). |
| Dataset Splits | No | The paper describes generating synthetic data for simulations (e.g., 'generate 200 data sets' or specifying ranges for n and v0 in Scenarios 1-6) and using a single real-world dataset ('Ion channels data') without mentioning specific training, validation, or test splits. The context is estimation and evaluation, not typical machine learning model training. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU/CPU models, memory, or processor types used for running its experiments. |
| Software Dependencies | No | The paper mentions 'the function loess from the R package stats' but does not specify a version number for R or the package. This is insufficient for reproducible software dependencies. |
| Experiment Setup | Yes | To choose λ, inspired by Tibshirani and Taylor (2012), we use a Bayesian information criterion given as d BIC(λ) := y ˆθ(λ) 2 + bdf(λ) log n (29) where bdf(λ) is the number of connected components induced by ˆθ(λ) in the graph G. Then we select the value of λ that minimizes d BIC(λ). Once ˆθ(λ) has been computed, we proceed to select λ for (12). We let ˆγ(λ ) be the solution to (12) and edf(λ ) be the number of connected components in G induced by ˆγ(λ ). Then we define g BIC(λ ) := i=1 [min{q, y2 i } ˆγ(λ )i]2 + edf(λ ) log n (30) where q is the 0.95-quantile of the data {y2 i }n i=1. We use min{q, y2 i } in (30) to avoid the influence of outliers in the model selection step. With the above score in hand, we choose the value of λ that minimizes g BIC(λ ). In all our experiments, we select λ and λ from the set {101, 102, 103, 104, 105}. ... In our simulations, we set N = 5000, K is the Gaussian kernel, and h1 = h2 = h. We allow h {2 10, 2 9, . . . , 2 1} and report results for the choice of h that gives the best performance in terms of estimating the true parameter v 0. |