Limitation of Characterizing Implicit Regularization by Data-independent Functions

Authors: Leyang Zhang, Zhi-Qin John Xu, Tao Luo, Yaoyu Zhang

TMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We attempt to mathematically define and study implicit regularization. Importantly, we explore the limitations of a common approach to characterizing implicit regularization using data-independent functions. We propose two dynamical mechanisms, i.e., Two-point and One-point Overlapping mechanisms, based on which we provide two recipes for producing classes of one-hidden-neuron NNs that provably cannot be fully characterized by a type of or all data-independent functions. ... Experiments on such examples are also used to support our results.
Researcher Affiliation Academia Leyang Zhang EMAIL Department of Mathematics, College of Liberal Arts & Sciences University of Illinois Urbana Champaign; Zhi-Qin John Xu EMAIL School of Mathematical Sciences, Institute of Natural Sciences, MOE-LSC Shanghai Jiao Tong University; Tao Luo EMAIL School of Mathematical Sciences, Institute of Natural Sciences, MOE-LSC Shanghai Jiao Tong University CMA-Shanghai, Shanghai Artificial Intelligence Laboratory; Yaoyu Zhang EMAIL School of Mathematical Sciences, Institute of Natural Sciences, MOE-LSC Shanghai Jiao Tong University Shanghai Center for Brain Science and Brain-Inspired Technology
Pseudocode No The paper describes construction 'recipes' in sections like 'Two-point Overlapping Recipe (Part A)' and 'One-point Overlapping Recipe' with numbered steps (a), (b), (c), but these are descriptive textual steps for mathematical construction, not structured pseudocode or algorithm blocks formatted like code.
Open Source Code No The paper does not contain any explicit statements about releasing code, nor does it provide links to code repositories.
Open Datasets No The paper uses abstractly defined datasets S = {(xi, yi) : i I} or constructed single-sample datasets for numerical examples, such as 'S = {(x, y)} R2' or 'S = {(x, a0σ(xw0))}'. It does not use or provide access information for any publicly available or open datasets.
Dataset Splits No The paper focuses on theoretical analysis and numerical examples using constructed single-sample datasets; thus, the concepts of training, validation, or test dataset splits are not applicable or mentioned.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., GPU, CPU models, memory) used to run the numerical examples or simulations.
Software Dependencies No The paper does not specify any particular software or library dependencies with version numbers used for its numerical examples or simulations.
Experiment Setup Yes Figure 1: 'We choose the initial point θ0 = (0.922, 2.868). The sample for (i) the blue lines is (x1, y1) = (1, 1); (ii) the orange lines is (x2, y2) = (12.307, 1.400).' Figure 2: 'We choose initial value θ0 = (w0, a0) = (0.3, 1). Then θ = (w0, a0) = (0.3, 1). The dataset for (i) blue lines is (x, y) = (0, 6, a0σ(0.6w0)); (ii) orange lines is (x, y) = (1.0, a0σ(w0)); (iii) brown lines is (x, y) = (1.4, a0σ(1.4w0)); (iv) grey lines is (x, y) = (1.8, a0σ(1.8w0)).'