reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Bias/Variance is not the same as Approximation/Estimation

Authors: Gavin Brown, Riccardo Ali

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present several results for losses expressible as a Bregman divergence: a broad family with a known bias-variance decomposition. Discussion and future directions are presented for more general losses, including the 0/1 classification loss. Figure 4 shows results increasing the depth of a decision tree. The left panel shows excess risk, and the bias/variance components. We observe the classical bias/variance trade-off, including overfitting, as the depth increases beyond a certain point. It is notable that the bias decreases to zero, after depth 6. Does this imply the model is unbiased , in the sense that it has sufficient capacity to capture the full data distribution?
Researcher Affiliation	Academia	Gavin Brown EMAIL Department of Computer Science, The University of Manchester Riccardo Ali EMAIL Department of Computer Science & Technology, The University of Cambridge
Pseudocode	No	The paper describes algorithms and concepts using mathematical notation and prose, but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Full code is available at https://github.com/profgavinbrown/ondecompositions
Open Datasets	No	We use a synthetic 1-d problem: x [0, 15], and the true label is y = x + 5 sin(2x) + ϵ, where ϵ is Gaussian noise with zero mean and σ = 3. Training data is n = 100 points, illustrated below.
Dataset Splits	Yes	We use a synthetic 1-d problem: x [0, 15], and the true label is y = x + 5 sin(2x) + ϵ, where ϵ is Gaussian noise with zero mean and σ = 3. Training data is n = 100 points, illustrated below. The function class F is defined as the set of all trained models obtained over T = 1000 independently sampled datasets, each of size n = 100. The best-in-class model is the minimum across the T trials: where the risk R(f) is approximated by sample of uniformly sampled points at a resolution of 0.001, giving a total of n = 15, 000 test points.
Hardware Specification	No	The paper does not provide any specific details about the hardware used to run the experiments.
Software Dependencies	No	The paper does not explicitly mention any specific software dependencies with version numbers.
Experiment Setup	No	The paper describes the synthetic problem and model choices (e.g., decision tree depth, k-nn k value) but does not provide specific hyperparameter values for training, such as learning rates, batch sizes, or optimizer settings for the models used in the experiments.