reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Jointly Informative Feature Selection Made Tractable by Gaussian Modeling

Authors: Leonidas Lefakis, François Fleuret

JMLR 2016 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	An empirical evaluation using several types of classiﬁers on multiple data sets show that this class of methods outperforms state-of-the-art baselines, both in terms of speed and classiﬁcation accuracy. Keywords: feature selection, mutual information, entropy, mixture of Gaussians. In this section we present an empirical evaluation of the proposed algorithms. We ﬁrst show on a synthetic controlled experiment that they behave as expected regarding groups of jointly informative features, and then provide results obtained on three popular real-world computer vision data sets.
Researcher Affiliation	Collaboration	Leonidas Lefakis EMAIL Zalando Research Zalando SE Berlin, Germany. Fran cois Fleuret EMAIL Computer Vision and Learning group Idiap Research Institute Martigny, Switzerland.
Pseudocode	Yes	Table 2: Greedy Forward Subset Selection S0 for n = 1 . . . N do s = 0 for Xj F \ Sn 1 do S Sn 1 Xj s I(S ; Y ) if s > s then end if end for Si S end for return SN
Open Source Code	No	The paper mentions using 'the code provided by the authors' for pre-processing external datasets (Coates and Ng, 2011) and discusses its own 'C++ implementations' for performance comparison. However, there is no explicit statement or link indicating that the authors have made their own code publicly available for the methodology described in this paper.
Open Datasets	Yes	We report results on three standard computer vision data-sets which we used for our experiments: CIFAR-10 contains images of size 32 32 of 10 distinct classes depicting vehicles and animals. ... INRIA is a pedestrian detection data set. ... STL-10 consists of images of size 96 96 belonging to 10 classes, each represented by 500 training images. As for CIFAR we pre-process the data as in (Coates and Ng, 2011), resulting in a pool F of 4, 096 features.
Dataset Splits	No	The paper states: 'CIFAR-10... The training data consists of 5, 000 images of each class.' and 'STL-10... each represented by 500 training images.' and 'INRIA... 12, 180 training images'. It also discusses 'selecting uniformly at random without replacement' for a finite sample analysis. However, it does not explicitly provide the training/test/validation splits (e.g., percentages or specific counts) for the main experimental results, nor does it refer to standardized splits for all datasets with citations.
Hardware Specification	No	The paper states: 'The computation times provided were obtained with C++ implementations of the proposed methods.' While it discusses CPU time in Table 3, it does not specify any particular CPU model, GPU, or other hardware details (e.g., processor type, memory amount) used for running the experiments.
Software Dependencies	No	The paper mentions the use of C++ implementations for the proposed methods and MRMR, MATLAB for Spectral and CMTF baselines, and Java for other algorithms. However, it does not provide specific version numbers for any of these programming languages, libraries, or frameworks to ensure reproducibility.
Experiment Setup	No	The paper mentions combining selected features with 'four diﬀerent classiﬁers: Ada Boost with classiﬁcation stumps, linear SVM, RBF-kernel SVM, and quadratic discriminant analysis (QDA)'. It also states results are shown for 'several numbers of selected features {10, 25, 50, 100}'. However, it does not provide specific hyperparameters for these classifiers (e.g., learning rates, C values, kernel parameters, number of boosting rounds) or other detailed training configurations required for reproducibility.