reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Trend Filtering on Graphs

Authors: Yu-Xiang Wang, James Sharpnack, Alexander J. Smola, Ryan J. Tibshirani

JMLR 2016 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the merits of graph trend ﬁltering through both examples and theory. As a motivating example, consider a denoising problem on 402 census tracts of Allegheny County, PA, arranged into a graph with 402 vertices and 2382 edges obtained by connecting spatially adjacent tracts. To illustrate the adaptive property of graph trend ﬁltering we generated an artiﬁcial signal with inhomogeneous smoothness across the nodes, and two sharp peaks near the center of the graph, as can be seen in the top left panel of Figure 1. (The signal was formed using a mixture of ﬁve Gaussians, in the underlying spatial coordinates.) We drew noisy observations around this signal, shown in the top right panel of the ﬁgure, and we ﬁt graph trend ﬁltering, graph Laplacian smoothing, and wavelet smoothing to these observations. Figure 2: Mean squared errors for the Allegheny County example. Results were averaged over 10 simulations; the bars denote 1 standard errors. 5. Examples: In this section, we present a variety of examples of running graph trend ﬁltering on real graphs. Figure 5: Performance of GTF and others for three generative models on the Facebook graph. Table 1: Misclassiﬁcation rates of MAD-Laplacian and MAD-GTF for imputation in the UCI data sets.
Researcher Affiliation	Collaboration	Yu-Xiang Wang1,2 EMAIL James Sharpnack3 EMAIL Alexander J. Smola1,4 EMAIL Ryan J. Tibshirani1,2 EMAIL 1 Machine Learning Department, Carnegie Mellon University, Pittsburgh, PA 15213 2 Department of Statistics, Carnegie Mellon University, Pittsburgh, PA 15213 3 Mathematics Department, University of California at San Diego, La Jolla, CA 10280 4 Marianas Labs, Pittsburgh, PA 15213
Pseudocode	No	The paper describes computational approaches like ADMM and Newton methods in Section 4, but it does so in narrative text and mathematical equations without presenting any structured pseudocode or algorithm blocks (e.g., a labeled Algorithm 1 figure or a dedicated pseudocode section).
Open Source Code	No	The paper mentions using
Open Datasets	Yes	Here we examine the behavior of graph trend ﬁltering on a nonplanar graph: the Facebook graph from the Stanford Network Analysis Project (http://snap.stanford.edu). Our speciﬁcation of MAD-GTF only deviates from the MAD proposal of Talukdar and Crammer (2009) in that these authors used the Laplacian regularization term PK j=1 B j LBj, in place of ℓ1-based graph difference regularizer in (11). If the underlying class probabilities are thought As a broad comparison of the two methods, we ran them on the 11 most popular classiﬁcation data sets from the UCI Machine Learning repository (http://archive.ics.uci.edu/ml/). This data set was kindly provided by authors of Doraiswamy et al. (2014), who obtained the data from NYC Taxi & Limosine Commission.
Dataset Splits	No	The paper describes how synthetic measurements were generated and how 'seed labels' were randomly selected for semi-supervised learning experiments. For example, Section 5.2 states: 'randomly selected 5 seeds per class to serve as the observed class labels' and 'averaged over 10 repetitions of the randomly selected seed labels'. However, it does not provide specific training/test/validation splits (e.g., percentages, counts, or explicit file names) for any fixed dataset, but rather describes a sampling methodology or simulation setup.
Hardware Specification	No	The paper does not provide any specific hardware details such as GPU models, CPU types, or memory amounts used for running the experiments. It only generally describes experimental performance and convergence characteristics of algorithms.
Software Dependencies	No	The paper mentions various algorithms and methods like 'ADMM', 'Newton methods', 'parametric max-flow', and 'soft-thresholding', and also refers to 'wavelet implementations released by the authors of Coiman and Maggioni (2006); Irion (2015)'. However, it does not provide specific version numbers for any of these software components, libraries, or programming languages used in their own implementation.
Experiment Setup	Yes	For each data set, we constructed a 5nearest-neighbor graph based on the Euclidean distance between provided features, and randomly selected 5 seeds per class to serve as the observed class labels. Then we set ϵ = 0.01, used prior weights Rij = 1/K for all i and j, and chose the tuning parameter λ over a wide grid of values to represent the best achievable performance by each method, on each experiment. For a qualitative visual comparison, the smoothing parameter λ1 was chosen so that both methods have 200 degrees of freedom (without any sparsity imposed). The sparsity parameter was then set as λ2 = 0.2.