reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Adaptive Sampling for Large Scale Boosting

Authors: Charles Dubout, Francois Fleuret

JMLR 2014 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments in image classiﬁcation and object recognition on four standard computer vision data sets show that the adaptive methods we propose outperform basic sampling and state-of-the-art bandit methods.
Researcher Affiliation	Academia	Computer Vision and Learning Group Idiap Research Institute CH-1920 Martigny, Switzerland
Pseudocode	Yes	Algorithm 1 The Tasting 1.Q algorithm first samples uniformly R features from every feature subset Fk. It uses these features at every boosting step to find the optimal feature subset k from which to sample. After the selection of the Q features, the algorithm continues like Ada Boost. Algorithm 2 The Tasting Q.1 algorithm first samples uniformly R features from every feature subset Fk. It uses them to find the optimal subset kq for every one of the Q features to sample at every boosting step. After the selection of the Q features, the algorithm continues like Ada Boost. Algorithm 3 The M.A.S. naive algorithm models the current edge distribution with a Gaussian mixture model fitted on the edges estimated at the previous iteration. It uses this density model to compute the pair (Q , S ) maximizing the expectation of the true edge of the selected learner E[ϵ ], and then samples the corresponding number of weak learners and training examples, before keeping the weak learner with the highest approximated edge. After the selection of the Q features, the algorithm continues like Ada Boost. Algorithm 4 The Laminating algorithm starts by sampling Q weak learners and S examples at the beginning of every boosting iteration, and refine those by successively halving the number of learners and doubling the number of examples until only one learner remains. After the selection of the Q features, the algorithm continues like Ada Boost.
Open Source Code	No	The paper does not contain any explicit statement about releasing source code, nor does it provide any links to a code repository. The conclusion section discusses future extensions but not code availability.
Open Datasets	Yes	The first data set that we used is the MNIST handwritten digits database (Le Cun et al., 1998). ... The second data set that we used is the INRIA Person data set (Dalal and Triggs, 2005). ... The third data set that we used is Caltech 101 (Fei-Fei et al., 2004) ... The fourth and last data set that we used is CIFAR-10 (Krizhevsky, 2009).
Dataset Splits	Yes	The first data set that we used is the MNIST handwritten digits database (Le Cun et al., 1998). It is composed of 10 classes and its training and testing sets consist respectively of 60,000 and 10,000 grayscale images of resolution 28 28 pixels... The second data set that we used is the INRIA Person data set (Dalal and Triggs, 2005). It is composed of a training and a testing set respectively of 2,418 and 1,126 color images... The third data set that we used is Caltech 101 (Fei-Fei et al., 2004)... We sampled 15 training examples and 20 distinct test examples from every class, as advised on the data set website. The fourth and last data set that we used is CIFAR-10 (Krizhevsky, 2009)... its training and testing sets consist respectively of 50,000 and 10,000 color images...
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU, GPU models, memory) used for running the experiments. It only discusses the experimental setup in terms of algorithms and datasets.
Software Dependencies	No	The paper mentions using "Ada Boost.MH algorithm (Schapire and Singer, 1999) with decision stumps as weak learners" and several bandit algorithms (UCB, Exp3.P, ϵ-greedy), but it does not specify any software libraries or their version numbers for implementation.
Experiment Setup	Yes	We used the Ada Boost.MH algorithm (Schapire and Singer, 1999) with decision stumps as weak learners to be able to use all methods in the same conditions. ... We set the maximum cost of all the algorithms to 10N, setting Q = 10 and S = N for the baselines, as this configuration leads to the best results after 10,000 boosting rounds. ... We set the values of the parameters of Exp3.P to η = 0.3 and λ = 0.15 as recommended in (Busa-Fekete and Kegl, 2010).