reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Variance estimation in graphs with the fused lasso

Authors: Oscar Hernan Madrid Padilla

JMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper, we fill the gaps described above regarding mean and variance estimation in general graphs. Our main contributions are listed next. [...] Section 4 contains numerical evaluations of the proposed methods in both simulated and real data.
Researcher Affiliation	Academia	Oscar Hernan Madrid Padilla EMAIL Department of Statistics and Data Science Univeristy of California, Los Angeles Los Angeles, CA 90095.
Pseudocode	Yes	The DFS algorithm proceeds as follows: Procedure DFS(G, v): Step 1: Label v as discovered. Step 2: For all w such that (w, v) E do If vertex w is not label then recursively call DFS(G, w).
Open Source Code	No	The paper does not contain any explicit statement about providing open-source code for the methodology described, nor does it provide a link to a code repository.
Open Datasets	Yes	Speciﬁcally, we consider the Ion channels data used by Jula Vanegas et al. (2021).
Dataset Splits	No	The paper describes generating synthetic data for simulations (e.g., 'generate 200 data sets' or specifying ranges for n and v0 in Scenarios 1-6) and using a single real-world dataset ('Ion channels data') without mentioning specific training, validation, or test splits. The context is estimation and evaluation, not typical machine learning model training.
Hardware Specification	No	The paper does not provide any specific hardware details such as GPU/CPU models, memory, or processor types used for running its experiments.
Software Dependencies	No	The paper mentions 'the function loess from the R package stats' but does not specify a version number for R or the package. This is insufficient for reproducible software dependencies.
Experiment Setup	Yes	To choose λ, inspired by Tibshirani and Taylor (2012), we use a Bayesian information criterion given as d BIC(λ) := y ˆθ(λ) 2 + bdf(λ) log n (29) where bdf(λ) is the number of connected components induced by ˆθ(λ) in the graph G. Then we select the value of λ that minimizes d BIC(λ). Once ˆθ(λ) has been computed, we proceed to select λ for (12). We let ˆγ(λ ) be the solution to (12) and edf(λ ) be the number of connected components in G induced by ˆγ(λ ). Then we deﬁne g BIC(λ ) := i=1 [min{q, y2 i } ˆγ(λ )i]2 + edf(λ ) log n (30) where q is the 0.95-quantile of the data {y2 i }n i=1. We use min{q, y2 i } in (30) to avoid the inﬂuence of outliers in the model selection step. With the above score in hand, we choose the value of λ that minimizes g BIC(λ ). In all our experiments, we select λ and λ from the set {101, 102, 103, 104, 105}. ... In our simulations, we set N = 5000, K is the Gaussian kernel, and h1 = h2 = h. We allow h {2 10, 2 9, . . . , 2 1} and report results for the choice of h that gives the best performance in terms of estimating the true parameter v 0.