Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]
Nonparametric Learning of Two-Layer ReLU Residual Units
Authors: Zhunxuan Wang, Linyun He, Chunchuan Lyu, Shay B Cohen
TMLR 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We further prove the strong statistical consistency of our algorithm, and demonstrate its robustness and sample efficiency through experimental results on synthetic data and a set of benchmark regression datasets. In our experiments, we describe a first set of synthetic dataset experiments, that show that our algorithm identifies the true parameters from which the data were generated (Section 7.1), and then a second set of experiments that use our algorithm on standard real-world benchmark datasets (Section 7.2). |
| Researcher Affiliation | Collaboration | Zhunxuan Wang EMAIL Amazon London EC2A 2FA, United Kingdom Linyun He EMAIL Georgia Institute of Technology Atlanta, GA 30332, United States Chunchuan Lyu EMAIL Instituto de Telecomunicacões Torre Norte 1049-001 Lisbon, Portugal Shay B. Cohen EMAIL University of Edinburgh Edinburgh EH8 9AB, United Kingdom |
| Pseudocode | Yes | Algorithm 1 Learn a Re LU residual unit, layer 2. Algorithm 2 Learn a Re LU residual unit, layer 1. Algorithm 3 Learn a Re LU residual unit by LR. Algorithm 4 Rescale a ˆGNPE 2 minimizer. Algorithm 5 Learn a Re LU residual unit layer 2. |
| Open Source Code | Yes | Our code is available at https://github.com/uuzeeex/relu-resunit-learning. |
| Open Datasets | Yes | The first six datasets are standard benchmark datasets taken from Delve10 and the UCI Machine Learning Repository.11 The last dataset, Jigsaw, is bigger with its goal to use word Fast Text12 embeddings of tweets to predict their level of toxicity. |
| Dataset Splits | Yes | In all of our experiments, we report five-fold cross-validation results, where the first fold is used to tune the hyperparameters, and the last fold is used as both a validation set for early stopping (first half) and to report the results (second half). |
| Hardware Specification | Yes | CPU specification: 2.8 GHz Quad-Core Intel Core i7. |
| Software Dependencies | Yes | We use CVX (Grant & Boyd, 2014; 2008) that calls SDPT3 (Toh et al., 1999) (a free solver under GPLv3 license) and solves our convex QP/LP in polynomial time. For this set of experiments, with all datasets, we use the MOSEK solver with an academic license (Ap S, 2019). |
| Experiment Setup | Yes | SGD is conducted on mini-batch empirical losses of L(A, B) with batch size 32 for 256 epochs in each learning trial. We apply time-based learning rate decay η = η0/ (1 + γ T) with initial rate η0 = 10 3 and decay rate γ = 10 5, where T is the epoch number. With backpropagation, we use SGD with a learning rate of 0.000001 and batch size of 500. We run backpropagation until the mean squared error does not change between epochs within a fraction of 1/10000. |