Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]
Restricted Strong Convexity of Deep Learning Models with Smooth Activations
Authors: Arindam Banerjee, Pedro Cisneros-Velarde, Libin Zhu, Misha Belkin
ICLR 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We share preliminary experimental results supporting our theoretical advances. [...] In this section, we present experimental results verifying the RSC condition [...] on standard benchmarks: CIFAR-10, MNIST, and Fashion-MNIST. |
| Researcher Affiliation | Academia | Arindam Banerjee Department of Computer Science University of Illinois at Urbana-Champaign EMAIL; Pedro Cisneros-Velarde Department of Computer Science University of Illinois at Urbana-Champaign EMAIL; Libin Zhu Department of Computer Science University of California, San Diego EMAIL; Mikhail Belkin Haliciouglu Data Science Institute University of California, San Diego EMAIL |
| Pseudocode | No | The paper focuses on theoretical analysis and proofs but does not include any pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any explicit statements about releasing code or links to a code repository for the methodology described. |
| Open Datasets | Yes | In this section, we present experimental results verifying the RSC condition [...] on standard benchmarks: CIFAR-10, MNIST, and Fashion-MNIST. |
| Dataset Splits | No | The paper mentions '512 randomly chosen training points' and a 'training algorithm' with 'stopping criteria', but it does not specify explicit percentages or counts for training, validation, and test splits, nor does it refer to standard predefined splits for the datasets used beyond just naming them. |
| Hardware Specification | No | The paper does not specify any hardware details such as GPU models, CPU types, or memory used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software dependencies, programming languages, or libraries used in the experiments. |
| Experiment Setup | Yes | For the experiments, the network architecture we used had 3-layer fully connected neural network with tanh activation function. The training algorithm is gradient descent (GD) width constant learning rate, chosen appropriately to keep the training in NTK regime. Since we are using GD, we use 512 randomly chosen training points for the experiments. The stopping criteria is either training loss < 10 3 or number of iterations larger than 3000. |