Bias/Variance is not the same as Approximation/Estimation

Authors: Gavin Brown, Riccardo Ali

TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present several results for losses expressible as a Bregman divergence: a broad family with a known bias-variance decomposition. Discussion and future directions are presented for more general losses, including the 0/1 classification loss. Figure 4 shows results increasing the depth of a decision tree. The left panel shows excess risk, and the bias/variance components. We observe the classical bias/variance trade-off, including overfitting, as the depth increases beyond a certain point. It is notable that the bias decreases to zero, after depth 6. Does this imply the model is unbiased , in the sense that it has sufficient capacity to capture the full data distribution?
Researcher Affiliation Academia Gavin Brown EMAIL Department of Computer Science, The University of Manchester Riccardo Ali EMAIL Department of Computer Science & Technology, The University of Cambridge
Pseudocode No The paper describes algorithms and concepts using mathematical notation and prose, but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes Full code is available at https://github.com/profgavinbrown/ondecompositions
Open Datasets No We use a synthetic 1-d problem: x [0, 15], and the true label is y = x + 5 sin(2x) + ϵ, where ϵ is Gaussian noise with zero mean and σ = 3. Training data is n = 100 points, illustrated below.
Dataset Splits Yes We use a synthetic 1-d problem: x [0, 15], and the true label is y = x + 5 sin(2x) + ϵ, where ϵ is Gaussian noise with zero mean and σ = 3. Training data is n = 100 points, illustrated below. The function class F is defined as the set of all trained models obtained over T = 1000 independently sampled datasets, each of size n = 100. The best-in-class model is the minimum across the T trials: where the risk R(f) is approximated by sample of uniformly sampled points at a resolution of 0.001, giving a total of n = 15, 000 test points.
Hardware Specification No The paper does not provide any specific details about the hardware used to run the experiments.
Software Dependencies No The paper does not explicitly mention any specific software dependencies with version numbers.
Experiment Setup No The paper describes the synthetic problem and model choices (e.g., decision tree depth, k-nn k value) but does not provide specific hyperparameter values for training, such as learning rates, batch sizes, or optimizer settings for the models used in the experiments.