Unbiased estimators for random design regression

Authors: Michał Dereziński, Manfred K. Warmuth, Daniel Hsu

JMLR 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental For each estimator we plotted the loss LD(bw) for a range of sample sizes k, contrasted with the loss of the best leastsquares estimator w computed from all data. Plots shown in Figure 6.2 were averaged over 100 runs, with shaded area representing standard error of the mean. We used six benchmark datasets from the libsvm repository (Chang and Lin, 2011), whose dimensions are given in Table 6.1.
Researcher Affiliation Collaboration Micha l Derezi nski EMAIL Department of Electrical Engineering & Computer Science, University of Michigan Manfred K. Warmuth EMAIL UC Santa Cruz and Google Inc. Daniel Hsu EMAIL Department of Computer Science, Columbia University
Pseudocode Yes Algorithm 1 Distortion-free intermediate sampling; Algorithm 2 Reverse iterative sampling (Derezi nski and Warmuth, 2018)
Open Source Code No The paper does not provide concrete access to source code or explicitly state that the code is open-source or provided in supplementary materials.
Open Datasets Yes We used six benchmark datasets from the libsvm repository (Chang and Lin, 2011), whose dimensions are given in Table 6.1.
Dataset Splits No The paper mentions evaluating estimators for a range of sample sizes and averaging results over runs, but does not provide specific train/test/validation splits for the datasets used in experiments.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running the experiments.
Software Dependencies No The paper does not provide specific software dependencies, including library or solver names with version numbers, used to replicate the experiments.
Experiment Setup Yes For each estimator we plotted the loss LD(bw) for a range of sample sizes k, contrasted with the loss of the best leastsquares estimator w computed from all data. Plots shown in Figure 6.2 were averaged over 100 runs, with shaded area representing standard error of the mean.