reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

On the Optimality of Misspecified Spectral Algorithms

Authors: Haobo Zhang, Yicheng Li, Qian Lin

JMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper, we show that spectral algorithms are minimax optimal for any α0 1/β < s < 1, where β is the eigenvalue decay rate of H. We also give several classes of RKHSs whose embedding index satisﬁes α0 = 1/β. Thus, the spectral algorithms are minimax optimal for all s (0, 1) on these RKHSs. ... We verify our results through experiments in Section 5.
Researcher Affiliation	Academia	Haobo Zhang EMAIL Yicheng Li EMAIL Qian Lin EMAIL Center for Statistical Science, Department of Industrial Engineering Tsinghua University Beijing, China
Pseudocode	No	The paper describes algorithms (spectral algorithms, kernel ridge regression, gradient flow, spectral cut-off) in a narrative and mathematical style, but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks or figures.
Open Source Code	No	The paper does not contain an explicit statement about the release of source code for the methodology described, nor does it provide a link to a code repository or indicate code availability in supplementary materials.
Open Datasets	No	The experiments section uses synthetic data: "Suppose that X = [0, 1] and the marginal distribution µ is the uniform distribution on [0, 1]." and "x U[0, 1] and ϵ N(0, 1)." No existing public datasets are used or cited with access information.
Dataset Splits	No	The paper generates synthetic data samples for experiments. It states: "The sample size n is chosen from 1000 to 5000, with intervals of 100. We numerically compute the generalization error ˆf f L2 by Simpson s formula with N n testing points." This describes data generation and usage for evaluation, but not predefined train/test/validation splits of an existing dataset.
Hardware Specification	No	The paper does not provide any specific hardware details such as GPU models, CPU types, or memory amounts used for running the experiments described in Section 5.
Software Dependencies	No	The paper does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions or specific library versions) that were used in the experiments described in Section 5.
Experiment Setup	Yes	We consider the following data generation procedure: y = f (x) + ϵ, where f is numerically approximated by the ﬁrst 3000 terms in (17) or (18) with s = 0.4, x U[0, 1] and ϵ N(0, 1). Three kinds of spectral algorithms (kernel ridge regression, gradient ﬂow and spectral cut-oﬀ) are used to construct estimators ˆf for each RKHS, where we choose the regularization parameter as ν = cn^(-β/(sβ+1)) = cn^(-10/9) for a ﬁxed c. The sample size n is chosen from 1000 to 5000, with intervals of 100. We numerically compute the generalization error ˆf f L2 by Simpson s formula with N n testing points. For each n, we repeat the experiments 50 times and present the average generalization error as well as the region within one standard deviation. To visualize the convergence rate r, we perform logarithmic least-squares log err = r log n + b to ﬁt the generalization error with respect to the sample size and display the value of r.