Gradient Flow Provably Learns Robust Classifiers for Orthonormal GMMs

Authors: Hancheng Min, Rene Vidal

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental This paper shows that for certain data distributions one can learn a provably robust classifier using standard learning methods and without adding a defense mechanism. More specifically, this paper addresses the problem of finding a robust classifier for a binary classification problem in which the data comes from an isotropic mixture of Gaussians with orthonormal cluster centers... Our second set of results is to develop a full convergence analysis for gradient flow on a two-layer p Re LU network and show that: Theorem (Theorem 1 & Corollary 1, informal). When the intra-cluster variance α2 is sufficiently small, gradient flow on p Re LU networks (5) with p > 2 converges to a nearly optimal robust classifier. Appendix B. Additional Experiments on Learning Robust Classifiers for Data from Orthonormal GMMs
Researcher Affiliation Academia 1Center for Innovation in Data Engineering and Science (IDEAS) 2Department of Electrical and Systems Engineering 3Department of Radiology, University of Pennsylvania, Philadelphia, U.S.A.. Correspondence to: Hancheng Min <EMAIL>.
Pseudocode No The paper describes mathematical derivations and theoretical proofs related to gradient flow and network architectures. It does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any concrete access to source code for the methodology described. There are no links to repositories, nor explicit statements about code release.
Open Datasets No The paper uses a synthetic dataset generated based on the 'Orthonormal Gaussian Mixture Model' described within the paper. It specifies 'Consider a balanced mixture of K Gaussians in RD' and later 'synthetic GMM dataset of size n = 5000'. It does not refer to a publicly available, pre-existing dataset with a link, DOI, or formal citation.
Dataset Splits No The paper describes generating a 'balanced dataset ˆD = {xi, yi}KN i=1' of a certain size ('n = 5000' or '20000' in experiments). While it defines the characteristics of this synthetic dataset, it does not provide explicit training/test/validation splits for experiment reproduction. It implies training on the entire generated dataset: 'trained for a sufficient amount of epochs until they achieve perfect training accuracy on a synthesis orthonormal Gaussian mixture dataset'.
Hardware Specification No The paper does not provide specific hardware details such as GPU models, CPU types, or memory used for running the experiments described in Appendix B.
Software Dependencies No The paper mentions using 'gradient descent (SGD)' but does not specify any software frameworks (e.g., PyTorch, TensorFlow) or their version numbers, nor any other ancillary software dependencies with specific versions.
Experiment Setup Yes Appendix B.1: 'We run GD with step size 0.2 on a synthetic GMM dataset of size n = 5000 with D = 1000, K1 = 5, K2 = 5, α = 0.1, and keep track of the following:' and 'The initialization scale is ϵ = 10 7'. Appendix B.2: 'small initialization (all weight entries are randomly initialized as N(0, 1 10 4))', 'large initialization scale, where all weight entries are randomly initialized as N(0, 0.25)'. It also states 'All networks here are trained for a sufficient amount of epochs until they achieve perfect training accuracy'.