Gradient Flow Provably Learns Robust Classifiers for Orthonormal GMMs
Authors: Hancheng Min, Rene Vidal
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | This paper shows that for certain data distributions one can learn a provably robust classifier using standard learning methods and without adding a defense mechanism. More specifically, this paper addresses the problem of finding a robust classifier for a binary classification problem in which the data comes from an isotropic mixture of Gaussians with orthonormal cluster centers... Our second set of results is to develop a full convergence analysis for gradient flow on a two-layer p Re LU network and show that: Theorem (Theorem 1 & Corollary 1, informal). When the intra-cluster variance α2 is sufficiently small, gradient flow on p Re LU networks (5) with p > 2 converges to a nearly optimal robust classifier. Appendix B. Additional Experiments on Learning Robust Classifiers for Data from Orthonormal GMMs |
| Researcher Affiliation | Academia | 1Center for Innovation in Data Engineering and Science (IDEAS) 2Department of Electrical and Systems Engineering 3Department of Radiology, University of Pennsylvania, Philadelphia, U.S.A.. Correspondence to: Hancheng Min <EMAIL>. |
| Pseudocode | No | The paper describes mathematical derivations and theoretical proofs related to gradient flow and network architectures. It does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any concrete access to source code for the methodology described. There are no links to repositories, nor explicit statements about code release. |
| Open Datasets | No | The paper uses a synthetic dataset generated based on the 'Orthonormal Gaussian Mixture Model' described within the paper. It specifies 'Consider a balanced mixture of K Gaussians in RD' and later 'synthetic GMM dataset of size n = 5000'. It does not refer to a publicly available, pre-existing dataset with a link, DOI, or formal citation. |
| Dataset Splits | No | The paper describes generating a 'balanced dataset ˆD = {xi, yi}KN i=1' of a certain size ('n = 5000' or '20000' in experiments). While it defines the characteristics of this synthetic dataset, it does not provide explicit training/test/validation splits for experiment reproduction. It implies training on the entire generated dataset: 'trained for a sufficient amount of epochs until they achieve perfect training accuracy on a synthesis orthonormal Gaussian mixture dataset'. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or memory used for running the experiments described in Appendix B. |
| Software Dependencies | No | The paper mentions using 'gradient descent (SGD)' but does not specify any software frameworks (e.g., PyTorch, TensorFlow) or their version numbers, nor any other ancillary software dependencies with specific versions. |
| Experiment Setup | Yes | Appendix B.1: 'We run GD with step size 0.2 on a synthetic GMM dataset of size n = 5000 with D = 1000, K1 = 5, K2 = 5, α = 0.1, and keep track of the following:' and 'The initialization scale is ϵ = 10 7'. Appendix B.2: 'small initialization (all weight entries are randomly initialized as N(0, 1 10 4))', 'large initialization scale, where all weight entries are randomly initialized as N(0, 0.25)'. It also states 'All networks here are trained for a sufficient amount of epochs until they achieve perfect training accuracy'. |