Convex Formulations for Training Two-Layer ReLU Neural Networks

Authors: Karthik Prakhya, Tolga Birdal, Alp Yurtsever

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We then experimentally evaluate the tightness of this relaxation, demonstrating its competitive performance in test accuracy across a range of classification tasks. 5 NUMERICAL EXPERIMENTS We conduct a series of experiments over synthetic and real datasets to investigate the empirical tightness of our SDP relaxation.
Researcher Affiliation Academia Karthik Prakhya*, Tolga Birdal & Alp Yurtsever* * Department of Mathematics and Mathematical Statistics, Ume a University, Sweden Department of Computing, Imperial College London, United Kingdom
Pseudocode Yes Then, we can apply TOS for the rounding, starting from an initial estimate λ0 Rp R and iteratively updating it by the following formula: λk = proj D1 λk ˆλk = proj D2 2λk λk η ϕ(λk) λk+1 = λk λk + ˆλk (TOS)
Open Source Code Yes We make our implementation available under https://github.com/Karthik Prakhya/SDPNN-IW.
Open Datasets Yes Iris (Fisher, 1936): The dataset consists of 150 samples of iris flowers from three different classes, each sample is described by four features. The training partition includes a data matrix X R75 4 and one-hot encoded labels Y R75 3. Ionosphere (Sigillito et al., 1989): A radar dataset with 351 instances and 34 input features for binary classification, with class imbalance. Pima Indians Diabetes (Smith et al., 1988): A diabetes prediction dataset that consists of 768 patients, with 8 medical predictors as features, and a binary classification task, with class imbalance. Bank Notes Authentication (Lohweg, 2012): A binary classification dataset with 1372 instances, where features extracted using wavelet transforms are used to determine whether a banknote is genuine or forged. MNIST (Le Cun et al., 2010): A down-sampled and dimensionally-reduced version of the popular image classification dataset, consisting of 1000 instances with 20 features each obtained using Principal Component Analysis (PCA).
Dataset Splits Yes All real datasets are split into 50% train and 50% test sets.
Hardware Specification Yes The experiments were conducted on a Intel Xeon Gold 6132 with 192 GB of RAM and 2x14 cores.
Software Dependencies No Our SDP formulations were solved by CVXPY (Diamond & Boyd, 2016), employing either the interior point method (IPM) solver by MOSEK (Andersen & Andersen, 2000), or the Splitting Cone Solver (SCS) (O donoghue et al., 2016), depending on the problem size.
Experiment Setup Yes Initial LR = 10 5; # of iterations = 500K. Spiral: Initial LR = 10 3; # of iterations = 8K. Iris: Initial LR = 10 6; # of iterations = 2M. Ionosphere: Initial LR = 10 6; # of iterations = 2M for γ = 0.1 and 5M for γ = 0.01. Pima Indians Diabetes: Initial LR = 10 8; # of iterations = 5M for γ = 0.1 and 6M for γ = 0.01. Bank Notes Authentication: Initial LR = 10 6; # of iterations = 5M. MNIST: Initial LR = 10 7; # of iterations = 8M. ... The TOS step size is set as η = 1/ Λ 2, where Λ denotes the solution to the SDP-NN, and the algorithm is run for 1000 iterations.