Neural Network Architecture Beyond Width and Depth
Authors: Shijun Zhang, Zuowei Shen, Haizhao Yang
NeurIPS 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we use numerical experimentation to show the advantages of the super-approximation power of Re LU Nest Nets. |
| Researcher Affiliation | Academia | Zuowei Shen Department of Mathematics National University of Singapore EMAIL Haizhao Yang Department of Mathematics University of Maryland, College Park EMAIL Shijun Zhang Department of Mathematics National University of Singapore EMAIL |
| Pseudocode | No | The paper describes the network architecture and mathematical definitions but does not include any pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not mention releasing source code for the described methodology or provide any links to a code repository. |
| Open Datasets | Yes | We will design convolutional neural network (CNN) architectures activated by Re LU or the subnetwork activation function ϱ given in Equation (4) to classify image samples in Fashion-MNIST [47]. |
| Dataset Splits | No | The paper specifies training and test sample counts: For each i {0,1}, we randomly choose 3 105 training samples and 3 104 test samples in Si with label i. For Fashion-MNIST, it states: This dataset consists of a training set of 6 104 samples and a test set of 104 samples. However, it does not explicitly mention a validation split. |
| Hardware Specification | No | The paper does not specify any hardware details (e.g., GPU/CPU models, memory, or cloud instances) used for running the experiments. |
| Software Dependencies | No | The paper mentions using RAdam [23] as the optimization method, but it does not specify any software names with version numbers for libraries, frameworks, or environments. |
| Experiment Setup | Yes | The number of epochs and the batch size are set to 500 and 512, respectively. We adopt RAdam [23] as the optimization method. In epochs 5(i 1) + 1 to 5i for i = 1,2, ,100, the learning rate is 0.2 0.002 0.9i 1 for the parameters in ϱ and 0.002 0.9i 1 for all other parameters. |