Robust Feature Learning for Multi-Index Models in High Dimensions
Authors: Alireza Mousavi-Hosseini, Adel Javanmard, Murat A Erdogdu
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | As a proof of concept, we also provide small-scale numerical studies on Gaussian data to support intuitions derived from our theory. Additional experiments on real datasets are provided in Appendix E. [...] The first row of Figure 1 compares the performance of three different approaches. [...] As can be seen from Table 1, the STD + ADV training approach achieves a higher test accuracy compared to the ADV approach across all model architectures considered here. |
| Researcher Affiliation | Academia | 1University of Toronto, 2Vector Insitute, 3University of Southern California EMAIL,edu, EMAIL |
| Pseudocode | Yes | Algorithm 1 Adversarially robust learning with two-layer NNs. [...] Algorithm 2 Gradient-Based Feature Learner for Single-Index Polynomials (Oko et al., 2024, Algorithm 1, Phase I). [...] Algorithm 3 Gradient-Based Feature Learner for Multi-Index Polynomials (Damian et al., 2022, Algorithm 1, Adapted) |
| Open Source Code | Yes | The code to reproduce the results of Figure 1 and Table 1 is provided at: https://github.com/mousavih/robust-feature-learning. |
| Open Datasets | Yes | Additional experiments on real datasets are provided in Appendix E. [...] on the MNIST dataset (Le Cun et al., 1998) |
| Dataset Splits | No | To estimate the robust test risk, we fix a test set of 10,000 i.i.d. samples, and use 20 iterations to estimate the adversarial perturbation. [...] For both approaches, we use the corss entropy loss, a batch size of 64, a learning rate of 0.01 for both PGD and SGD updates |
| Hardware Specification | No | The paper does not provide specific hardware details used for running its experiments. |
| Software Dependencies | No | The paper mentions 'Py Torch initialization' but does not specify a version number or other key software components with their versions. |
| Experiment Setup | Yes | We implement adversarial training in the following manner. At each iteration, we sample a new batch of i.i.d. training examples. We estimate the adversarial perturbations on this batch by performing 5 steps of signed projected gradient ascent, with a stepsize of 0.1. We then perform a gradient descent step on the perturbed batch. [...] The student network has N = 100 neurons, and the input is sampled from x N(0, Id) with d = 100. [...] For both approaches, we use the corss entropy loss, a batch size of 64, a learning rate of 0.01 for both PGD and SGD updates, and we use ℓ norm to constrain perturbations, where pixels are normalized between 0 and 1. |