reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Beyond Trees: Classification with Sparse Pairwise Dependencies

Authors: Yaniv Tenzer, Amit Moscovich, Mary Frances Dorn, Boaz Nadler, Clifford Spiegelman

JMLR 2020 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We compare the predictive accuracy of SLB to several widely used classiﬁers both on synthetic data and on various public data sets. As our empirical results show, when the underlying distribution is forest structured, the accuracy of our classiﬁer is comparable to methods that explicitly assume a forest structure. However, when the underlying distributions are not forest structured, but instead follow more complicated Bayesian network models, our more ﬂexible method often outperforms the other methods. Furthermore, our experiments highlight the importance of incorporating bivariate features. Finally, as we illustrate with several real data sets, SLB is competitive to popular classiﬁers. (Section 5: Experimental Results)
Researcher Affiliation	Academia	Yaniv Tenzer EMAIL Department of Computer Science and Applied Mathematics Weizmann Institute of Science Rehovot, 76100, Israel; Amit Moscovich EMAIL Program in Applied and Computational Mathematics Princeton University Princeton, NJ 08544, USA; Mary Frances Dorn EMAIL Los Alamos National Laboratory P.O. Box 1663, MS F600 Los Alamos, NM 87545, USA; Boaz Nadler EMAIL Department of Computer Science and Applied Mathematics Weizmann Institute of Science Rehovot, 76100, Israel; Clifford Spiegelman Department of Statistics Texas A&M University College Station, TX 77843, USA. All affiliations are with academic institutions or national research laboratories, indicating an academic classification.
Pseudocode	Yes	Algorithm 1 The Sparse Log-Bivariate Density Classiﬁer (SLB) Input: Sample (x1, y1), . . . , (xn, yn) of feature vectors x Rd and labels y { 1, +1}. Step 1: For each class y {+1, 1}, estimate statistical dependencies \HSIC between all pairs of variables i, j {1, . . . , d} and y { 1, +1} and ﬁlter out weakly-dependent pairs of variables. Step 2: Compute all univariate density estimates bpy(xi), and the bivariate density estimates bpy(xi, xj) for the variable pairs that were not ﬁltered in the previous step. Step 3: Fit a linear SVM (bw,bb) to the transformed samples {( b T(xi), yi)}, where b T is the log-density transformation deﬁned in Eq. (5). Output: Classiﬁer given by x 7 sign(bw T b T(x) bb).
Open Source Code	No	The paper mentions that "All compared methods were implemented in the R programming language. When required, univariate or bivariate densities were estimated by the ks package. SLB: Our sparse log-bivariate density classiﬁer. We estimated all pairwise HSIC values using the d HSIC package with a Gaussian kernel. The SVM classiﬁer was constructed by the e1071 package with a default parameter λ = 1/2." This describes the software and packages used but does not provide specific access to the authors' implementation code for the SLB classifier.
Open Datasets	Yes	We next evaluate the various classiﬁers on 16 real data sets, publicly available at the UCI Machine Learning and Kaggle1 Databases (Dheeru and Karra Taniskidou, 2017). The footnote indicates 1. www.kaggle.com.
Dataset Splits	Yes	The misclassiﬁcation error was estimated by 5-fold cross-validation, with the folds sampled in a stratiﬁed manner so that they have approximately the same proportions of class labels as the full data set.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running the experiments. It only mentions the programming language and packages used.
Software Dependencies	No	All compared methods were implemented in the R programming language. When required, univariate or bivariate densities were estimated by the ks package. SLB: Our sparse log-bivariate density classiﬁer. We estimated all pairwise HSIC values using the d HSIC package with a Gaussian kernel. The SVM classiﬁer was constructed by the e1071 package with a default parameter λ = 1/2. The paper lists several software packages but does not specify their version numbers, which is required for a reproducible description.
Experiment Setup	No	The paper mentions that for the SVM classiﬁer, the e1071 package was used "with a default parameter λ = 1/2" (for SLB) and "by default λ = 1" (for SVM RBF). It also mentions "50 trees and the default number of √d randomly selected variables at each split" for Random Forest. However, these are general settings for the chosen classifiers or their default values. The paper does not provide comprehensive hyperparameter values or system-level training settings for their proposed method (SLB) or other comparative methods to allow full reproduction of the experimental setup.