Ensembles of Classifiers: a Bias-Variance Perspective
Authors: Neha Gupta, Jamie Smith, Ben Adlam, Zelda E Mariet
TMLR 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we see that such ensembles of neural networks may reduce the bias. We conclude with an empirical analysis of ensembles over neural network architecture hyperparameters, revealing that these techniques allow for more efficient bias reduction than standard ensembles. We then investigate the empirical behavior of ensembles of neural networks. We first contrast the behaviors of the two natural ensembling choices that arise from the generalized BVD. |
| Researcher Affiliation | Collaboration | Neha Gupta EMAIL Stanford University Jamie Smith EMAIL Google Ben Adlam EMAIL Google Zelda Mariet EMAIL Google |
| Pseudocode | Yes | Algorithm 1 Bootstrap estimate of bias (or variance) Input: Training set T, number of bootstrap samples B for i {1, . . . , B} do Ti uniform_sample(T) Of size |T| for j {1, . . . , B} do Tij uniform_sample(Ti) Of size |T| b(2) i bias({Tij}j) Bootstrap estimate for Ti b(1) bias({Ti}i) Bootstrap estimate for T b(2) 1 B P i b(2) i t b(1)/b(2) Corrective term b(0) tb(1) return Bias estimate b(0). |
| Open Source Code | No | The paper does not contain any explicit statements about the release of its own source code, nor does it provide a link to a code repository for the methodology described. |
| Open Datasets | Yes | Empirically, we see on Cifar10 and Cifar100 that primal ensembling neural networks under the cross-entropy loss tends to decrease the bias... Figure 3c stratifies the decomposition by corruption intensity on the corrupted Cifar-100 (Hendrycks & Dietterich, 2019) test set. Figure 8: Conditional bias, variance, and NLL of primal and dual WRN-28 10 ensembles on the SVHN test set (left) and the corrupted Cifar10 and Cifar100 test sets (averaged over all corruptions). |
| Dataset Splits | Yes | To compare the conditional and bootstrapped estimates, we trained Wide Res Nets (WRNs) (Zagoruyko & Komodakis, 2016) with the cross-entropy loss on different disjoint partitionings of the CIFAR-10 dataset. Figure 1a uses 50 partitions of 1k training points, and Figure 1b uses 20 partitions of 2.5k training points. We use the learning rate schedule, batch size, and data augmentations specified in the deterministic baseline provided by Nado et al. (2021). |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or other accelerator specifications used for running the experiments. |
| Software Dependencies | No | The paper mentions the use of Wide Residual Networks (WRN-28-10) and SGD + momentum for optimization, but it does not specify version numbers for any programming languages, libraries, or frameworks used (e.g., Python, PyTorch, TensorFlow, CUDA). |
| Experiment Setup | Yes | We train models with SGD + momentum to optimize the cross-entropy loss. We use the learning rate schedule, batch size, and data augmentations specified in the deterministic baseline provided by Nado et al. (2021). Figures 3a and 3b show the evolution of the total loss, bias, and variance of ensembles of independent WRNs 28 10 under the cross-entropy loss on the associated Cifar test sets. |