Early Stopping for Iterative Regularization with General Loss Functions

Authors: Ting Hu, Yunwen Lei

JMLR 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical In this paper, we investigate the early stopping strategy for the iterative regularization technique, which is based on gradient descent of convex loss functions in reproducing kernel Hilbert spaces without an explicit regularization term. This work shows that projecting the last iterate of the stopping time produces an estimator that can improve the generalization ability. Using the upper bound of the generalization errors, we establish a close link between the iterative regularization and Tikhonov regularization scheme and explain theoretically why the two schemes have similar regularization paths in the existing numerical simulations. We introduce a data-dependent way based on cross-validation to select the stopping time. We prove that the a-posteriori selection way can retain the comparable generalization errors to those obtained by our stopping rules with a-prior parameters.
Researcher Affiliation Academia Ting Hu EMAIL Center for Intelligent Decision-Making and Machine Learning School of Management Xi an Jiaotong University Xi an, China Yunwen Lei EMAIL Department of Mathematics Hong Kong Baptist University Kowloon, Hong Kong, China
Pseudocode No The paper describes the iterative algorithm in Definition 1 using mathematical notation (2.2), but it does not present it as a structured pseudocode or algorithm block.
Open Source Code No The paper does not contain any explicit statement about releasing source code or provide a link to a code repository.
Open Datasets No The paper refers to a generic 'sample set D = {(xi, yi)}m i=1 Z drawn from the unknown ρ' and discusses various loss functions and conditions, but it does not provide access information (link, DOI, repository, or formal citation) for any specific dataset used in experiments. Examples like CIFAR-10 are mentioned in a general discussion, not as datasets used by the authors.
Dataset Splits Yes In the rest of the section, we assume that the data size m = 2n that is even with some n N and the data set D is the disjoint union of two data subsets, D1 (the training set) and D2 (the validation set), of equal cardinality |D1| = |D2| = n.
Hardware Specification No The paper focuses on theoretical analysis and does not describe any specific hardware used for experiments.
Software Dependencies No The paper discusses algorithmic approaches and mathematical frameworks (e.g., 'gradient descent', 'Tikhonov regularization', 'Reproducing Kernel Hilbert Spaces') but does not specify any software libraries, tools, or their versions used for implementation or analysis.
Experiment Setup No The paper defines theoretical parameters such as step size sequences (ηt = ηt^θ) and stopping times (T = m^α), and discusses their theoretical properties for convergence rates. It does not provide concrete hyperparameter values or system-level training settings for a practical experiment.