Convex Formulations for Training Two-Layer ReLU Neural Networks
Authors: Karthik Prakhya, Tolga Birdal, Alp Yurtsever
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We then experimentally evaluate the tightness of this relaxation, demonstrating its competitive performance in test accuracy across a range of classification tasks. 5 NUMERICAL EXPERIMENTS We conduct a series of experiments over synthetic and real datasets to investigate the empirical tightness of our SDP relaxation. |
| Researcher Affiliation | Academia | Karthik Prakhya*, Tolga Birdal & Alp Yurtsever* * Department of Mathematics and Mathematical Statistics, Ume a University, Sweden Department of Computing, Imperial College London, United Kingdom |
| Pseudocode | Yes | Then, we can apply TOS for the rounding, starting from an initial estimate λ0 Rp R and iteratively updating it by the following formula: λk = proj D1 λk ˆλk = proj D2 2λk λk η ϕ(λk) λk+1 = λk λk + ˆλk (TOS) |
| Open Source Code | Yes | We make our implementation available under https://github.com/Karthik Prakhya/SDPNN-IW. |
| Open Datasets | Yes | Iris (Fisher, 1936): The dataset consists of 150 samples of iris flowers from three different classes, each sample is described by four features. The training partition includes a data matrix X R75 4 and one-hot encoded labels Y R75 3. Ionosphere (Sigillito et al., 1989): A radar dataset with 351 instances and 34 input features for binary classification, with class imbalance. Pima Indians Diabetes (Smith et al., 1988): A diabetes prediction dataset that consists of 768 patients, with 8 medical predictors as features, and a binary classification task, with class imbalance. Bank Notes Authentication (Lohweg, 2012): A binary classification dataset with 1372 instances, where features extracted using wavelet transforms are used to determine whether a banknote is genuine or forged. MNIST (Le Cun et al., 2010): A down-sampled and dimensionally-reduced version of the popular image classification dataset, consisting of 1000 instances with 20 features each obtained using Principal Component Analysis (PCA). |
| Dataset Splits | Yes | All real datasets are split into 50% train and 50% test sets. |
| Hardware Specification | Yes | The experiments were conducted on a Intel Xeon Gold 6132 with 192 GB of RAM and 2x14 cores. |
| Software Dependencies | No | Our SDP formulations were solved by CVXPY (Diamond & Boyd, 2016), employing either the interior point method (IPM) solver by MOSEK (Andersen & Andersen, 2000), or the Splitting Cone Solver (SCS) (O donoghue et al., 2016), depending on the problem size. |
| Experiment Setup | Yes | Initial LR = 10 5; # of iterations = 500K. Spiral: Initial LR = 10 3; # of iterations = 8K. Iris: Initial LR = 10 6; # of iterations = 2M. Ionosphere: Initial LR = 10 6; # of iterations = 2M for γ = 0.1 and 5M for γ = 0.01. Pima Indians Diabetes: Initial LR = 10 8; # of iterations = 5M for γ = 0.1 and 6M for γ = 0.01. Bank Notes Authentication: Initial LR = 10 6; # of iterations = 5M. MNIST: Initial LR = 10 7; # of iterations = 8M. ... The TOS step size is set as η = 1/ Λ 2, where Λ denotes the solution to the SDP-NN, and the algorithm is run for 1000 iterations. |