Bias/Variance is not the same as Approximation/Estimation
Authors: Gavin Brown, Riccardo Ali
TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present several results for losses expressible as a Bregman divergence: a broad family with a known bias-variance decomposition. Discussion and future directions are presented for more general losses, including the 0/1 classification loss. Figure 4 shows results increasing the depth of a decision tree. The left panel shows excess risk, and the bias/variance components. We observe the classical bias/variance trade-off, including overfitting, as the depth increases beyond a certain point. It is notable that the bias decreases to zero, after depth 6. Does this imply the model is unbiased , in the sense that it has sufficient capacity to capture the full data distribution? |
| Researcher Affiliation | Academia | Gavin Brown EMAIL Department of Computer Science, The University of Manchester Riccardo Ali EMAIL Department of Computer Science & Technology, The University of Cambridge |
| Pseudocode | No | The paper describes algorithms and concepts using mathematical notation and prose, but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Full code is available at https://github.com/profgavinbrown/ondecompositions |
| Open Datasets | No | We use a synthetic 1-d problem: x [0, 15], and the true label is y = x + 5 sin(2x) + ϵ, where ϵ is Gaussian noise with zero mean and σ = 3. Training data is n = 100 points, illustrated below. |
| Dataset Splits | Yes | We use a synthetic 1-d problem: x [0, 15], and the true label is y = x + 5 sin(2x) + ϵ, where ϵ is Gaussian noise with zero mean and σ = 3. Training data is n = 100 points, illustrated below. The function class F is defined as the set of all trained models obtained over T = 1000 independently sampled datasets, each of size n = 100. The best-in-class model is the minimum across the T trials: where the risk R(f) is approximated by sample of uniformly sampled points at a resolution of 0.001, giving a total of n = 15, 000 test points. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used to run the experiments. |
| Software Dependencies | No | The paper does not explicitly mention any specific software dependencies with version numbers. |
| Experiment Setup | No | The paper describes the synthetic problem and model choices (e.g., decision tree depth, k-nn k value) but does not provide specific hyperparameter values for training, such as learning rates, batch sizes, or optimizer settings for the models used in the experiments. |