reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Ensembles of Classifiers: a Bias-Variance Perspective

Authors: Neha Gupta, Jamie Smith, Ben Adlam, Zelda E Mariet

TMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, we see that such ensembles of neural networks may reduce the bias. We conclude with an empirical analysis of ensembles over neural network architecture hyperparameters, revealing that these techniques allow for more eﬃcient bias reduction than standard ensembles. We then investigate the empirical behavior of ensembles of neural networks. We ﬁrst contrast the behaviors of the two natural ensembling choices that arise from the generalized BVD.
Researcher Affiliation	Collaboration	Neha Gupta EMAIL Stanford University Jamie Smith EMAIL Google Ben Adlam EMAIL Google Zelda Mariet EMAIL Google
Pseudocode	Yes	Algorithm 1 Bootstrap estimate of bias (or variance) Input: Training set T, number of bootstrap samples B for i {1, . . . , B} do Ti uniform_sample(T) Of size \|T\| for j {1, . . . , B} do Tij uniform_sample(Ti) Of size \|T\| b(2) i bias({Tij}j) Bootstrap estimate for Ti b(1) bias({Ti}i) Bootstrap estimate for T b(2) 1 B P i b(2) i t b(1)/b(2) Corrective term b(0) tb(1) return Bias estimate b(0).
Open Source Code	No	The paper does not contain any explicit statements about the release of its own source code, nor does it provide a link to a code repository for the methodology described.
Open Datasets	Yes	Empirically, we see on Cifar10 and Cifar100 that primal ensembling neural networks under the cross-entropy loss tends to decrease the bias... Figure 3c stratiﬁes the decomposition by corruption intensity on the corrupted Cifar-100 (Hendrycks & Dietterich, 2019) test set. Figure 8: Conditional bias, variance, and NLL of primal and dual WRN-28 10 ensembles on the SVHN test set (left) and the corrupted Cifar10 and Cifar100 test sets (averaged over all corruptions).
Dataset Splits	Yes	To compare the conditional and bootstrapped estimates, we trained Wide Res Nets (WRNs) (Zagoruyko & Komodakis, 2016) with the cross-entropy loss on diﬀerent disjoint partitionings of the CIFAR-10 dataset. Figure 1a uses 50 partitions of 1k training points, and Figure 1b uses 20 partitions of 2.5k training points. We use the learning rate schedule, batch size, and data augmentations speciﬁed in the deterministic baseline provided by Nado et al. (2021).
Hardware Specification	No	The paper does not provide specific hardware details such as GPU models, CPU types, or other accelerator specifications used for running the experiments.
Software Dependencies	No	The paper mentions the use of Wide Residual Networks (WRN-28-10) and SGD + momentum for optimization, but it does not specify version numbers for any programming languages, libraries, or frameworks used (e.g., Python, PyTorch, TensorFlow, CUDA).
Experiment Setup	Yes	We train models with SGD + momentum to optimize the cross-entropy loss. We use the learning rate schedule, batch size, and data augmentations speciﬁed in the deterministic baseline provided by Nado et al. (2021). Figures 3a and 3b show the evolution of the total loss, bias, and variance of ensembles of independent WRNs 28 10 under the cross-entropy loss on the associated Cifar test sets.