Variance estimation in graphs with the fused lasso

Authors: Oscar Hernan Madrid Padilla

JMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this paper, we fill the gaps described above regarding mean and variance estimation in general graphs. Our main contributions are listed next. [...] Section 4 contains numerical evaluations of the proposed methods in both simulated and real data.
Researcher Affiliation Academia Oscar Hernan Madrid Padilla EMAIL Department of Statistics and Data Science Univeristy of California, Los Angeles Los Angeles, CA 90095.
Pseudocode Yes The DFS algorithm proceeds as follows: Procedure DFS(G, v): Step 1: Label v as discovered. Step 2: For all w such that (w, v) E do If vertex w is not label then recursively call DFS(G, w).
Open Source Code No The paper does not contain any explicit statement about providing open-source code for the methodology described, nor does it provide a link to a code repository.
Open Datasets Yes Specifically, we consider the Ion channels data used by Jula Vanegas et al. (2021).
Dataset Splits No The paper describes generating synthetic data for simulations (e.g., 'generate 200 data sets' or specifying ranges for n and v0 in Scenarios 1-6) and using a single real-world dataset ('Ion channels data') without mentioning specific training, validation, or test splits. The context is estimation and evaluation, not typical machine learning model training.
Hardware Specification No The paper does not provide any specific hardware details such as GPU/CPU models, memory, or processor types used for running its experiments.
Software Dependencies No The paper mentions 'the function loess from the R package stats' but does not specify a version number for R or the package. This is insufficient for reproducible software dependencies.
Experiment Setup Yes To choose λ, inspired by Tibshirani and Taylor (2012), we use a Bayesian information criterion given as d BIC(λ) := y ˆθ(λ) 2 + bdf(λ) log n (29) where bdf(λ) is the number of connected components induced by ˆθ(λ) in the graph G. Then we select the value of λ that minimizes d BIC(λ). Once ˆθ(λ) has been computed, we proceed to select λ for (12). We let ˆγ(λ ) be the solution to (12) and edf(λ ) be the number of connected components in G induced by ˆγ(λ ). Then we define g BIC(λ ) := i=1 [min{q, y2 i } ˆγ(λ )i]2 + edf(λ ) log n (30) where q is the 0.95-quantile of the data {y2 i }n i=1. We use min{q, y2 i } in (30) to avoid the influence of outliers in the model selection step. With the above score in hand, we choose the value of λ that minimizes g BIC(λ ). In all our experiments, we select λ and λ from the set {101, 102, 103, 104, 105}. ... In our simulations, we set N = 5000, K is the Gaussian kernel, and h1 = h2 = h. We allow h {2 10, 2 9, . . . , 2 1} and report results for the choice of h that gives the best performance in terms of estimating the true parameter v 0.