On the Optimality of Misspecified Spectral Algorithms
Authors: Haobo Zhang, Yicheng Li, Qian Lin
JMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we show that spectral algorithms are minimax optimal for any α0 1/β < s < 1, where β is the eigenvalue decay rate of H. We also give several classes of RKHSs whose embedding index satisfies α0 = 1/β. Thus, the spectral algorithms are minimax optimal for all s (0, 1) on these RKHSs. ... We verify our results through experiments in Section 5. |
| Researcher Affiliation | Academia | Haobo Zhang EMAIL Yicheng Li EMAIL Qian Lin EMAIL Center for Statistical Science, Department of Industrial Engineering Tsinghua University Beijing, China |
| Pseudocode | No | The paper describes algorithms (spectral algorithms, kernel ridge regression, gradient flow, spectral cut-off) in a narrative and mathematical style, but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks or figures. |
| Open Source Code | No | The paper does not contain an explicit statement about the release of source code for the methodology described, nor does it provide a link to a code repository or indicate code availability in supplementary materials. |
| Open Datasets | No | The experiments section uses synthetic data: "Suppose that X = [0, 1] and the marginal distribution µ is the uniform distribution on [0, 1]." and "x U[0, 1] and ϵ N(0, 1)." No existing public datasets are used or cited with access information. |
| Dataset Splits | No | The paper generates synthetic data samples for experiments. It states: "The sample size n is chosen from 1000 to 5000, with intervals of 100. We numerically compute the generalization error ˆf f L2 by Simpson s formula with N n testing points." This describes data generation and usage for evaluation, but not predefined train/test/validation splits of an existing dataset. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU models, CPU types, or memory amounts used for running the experiments described in Section 5. |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions or specific library versions) that were used in the experiments described in Section 5. |
| Experiment Setup | Yes | We consider the following data generation procedure: y = f (x) + ϵ, where f is numerically approximated by the first 3000 terms in (17) or (18) with s = 0.4, x U[0, 1] and ϵ N(0, 1). Three kinds of spectral algorithms (kernel ridge regression, gradient flow and spectral cut-off) are used to construct estimators ˆf for each RKHS, where we choose the regularization parameter as ν = cn^(-β/(sβ+1)) = cn^(-10/9) for a fixed c. The sample size n is chosen from 1000 to 5000, with intervals of 100. We numerically compute the generalization error ˆf f L2 by Simpson s formula with N n testing points. For each n, we repeat the experiments 50 times and present the average generalization error as well as the region within one standard deviation. To visualize the convergence rate r, we perform logarithmic least-squares log err = r log n + b to fit the generalization error with respect to the sample size and display the value of r. |