Limitation of Characterizing Implicit Regularization by Data-independent Functions
Authors: Leyang Zhang, Zhi-Qin John Xu, Tao Luo, Yaoyu Zhang
TMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We attempt to mathematically define and study implicit regularization. Importantly, we explore the limitations of a common approach to characterizing implicit regularization using data-independent functions. We propose two dynamical mechanisms, i.e., Two-point and One-point Overlapping mechanisms, based on which we provide two recipes for producing classes of one-hidden-neuron NNs that provably cannot be fully characterized by a type of or all data-independent functions. ... Experiments on such examples are also used to support our results. |
| Researcher Affiliation | Academia | Leyang Zhang EMAIL Department of Mathematics, College of Liberal Arts & Sciences University of Illinois Urbana Champaign; Zhi-Qin John Xu EMAIL School of Mathematical Sciences, Institute of Natural Sciences, MOE-LSC Shanghai Jiao Tong University; Tao Luo EMAIL School of Mathematical Sciences, Institute of Natural Sciences, MOE-LSC Shanghai Jiao Tong University CMA-Shanghai, Shanghai Artificial Intelligence Laboratory; Yaoyu Zhang EMAIL School of Mathematical Sciences, Institute of Natural Sciences, MOE-LSC Shanghai Jiao Tong University Shanghai Center for Brain Science and Brain-Inspired Technology |
| Pseudocode | No | The paper describes construction 'recipes' in sections like 'Two-point Overlapping Recipe (Part A)' and 'One-point Overlapping Recipe' with numbered steps (a), (b), (c), but these are descriptive textual steps for mathematical construction, not structured pseudocode or algorithm blocks formatted like code. |
| Open Source Code | No | The paper does not contain any explicit statements about releasing code, nor does it provide links to code repositories. |
| Open Datasets | No | The paper uses abstractly defined datasets S = {(xi, yi) : i I} or constructed single-sample datasets for numerical examples, such as 'S = {(x, y)} R2' or 'S = {(x, a0σ(xw0))}'. It does not use or provide access information for any publicly available or open datasets. |
| Dataset Splits | No | The paper focuses on theoretical analysis and numerical examples using constructed single-sample datasets; thus, the concepts of training, validation, or test dataset splits are not applicable or mentioned. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU, CPU models, memory) used to run the numerical examples or simulations. |
| Software Dependencies | No | The paper does not specify any particular software or library dependencies with version numbers used for its numerical examples or simulations. |
| Experiment Setup | Yes | Figure 1: 'We choose the initial point θ0 = (0.922, 2.868). The sample for (i) the blue lines is (x1, y1) = (1, 1); (ii) the orange lines is (x2, y2) = (12.307, 1.400).' Figure 2: 'We choose initial value θ0 = (w0, a0) = (0.3, 1). Then θ = (w0, a0) = (0.3, 1). The dataset for (i) blue lines is (x, y) = (0, 6, a0σ(0.6w0)); (ii) orange lines is (x, y) = (1.0, a0σ(w0)); (iii) brown lines is (x, y) = (1.4, a0σ(1.4w0)); (iv) grey lines is (x, y) = (1.8, a0σ(1.8w0)).' |