Generalizability of Neural Networks Minimizing Empirical Risk Based on Expressive Power
Authors: Lijia Yu, Yibo Miao, Yifan Zhu, XIAOSHAN GAO, Lijun Zhang
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we give some simple experiments to validate our theoretical conclusions. Our experimental setup is as follows. We used MNIST data set and two-layer networks with Re LU activation function. When training the network, we ensure that the absolute value of each parameter is smaller than 1 by weight-clipping after each gradient descent. Two experiments are considered: About size and accuracy: For networks with widths 100,200,. . . ,900,1000, we observe their accuracy on the test set after training. The results are shown in Figure 1. About data and precision: Using training sets with 10%, 20%, . . . , 90%, 100% of the original training set to train a network with widths 200, 400 and 600. The results are shown in Figure 2. |
| Researcher Affiliation | Academia | Lijia Yu1, Yibo Miao2, 3, Yifan Zhu2, 3, Xiao-Shan Gao2, 3 , Lijun Zhang1, 3, 4 1 Key Laboratory of System Software of Chinese Academy of Sciences Institute of Software, Chinese Academy of Sciences 2 State Key Laboratory of Mathematical Sciences Academy of Mathematics and Systems Science, Chinese Academy of Sciences 3 University of Chinese Academy of Sciences 4 Institute of AI for Industries, Chinese Academy of Sciences |
| Pseudocode | No | The paper describes methods using mathematical formulations and natural language, but does not include any explicit pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any explicit statements about releasing source code or links to a code repository. |
| Open Datasets | Yes | In this section, we give some simple experiments to validate our theoretical conclusions. Our experimental setup is as follows. We used MNIST data set and two-layer networks with Re LU activation function. |
| Dataset Splits | Yes | About data and precision: Using training sets with 10%, 20%, . . . , 90%, 100% of the original training set to train a network with widths 200, 400 and 600. |
| Hardware Specification | No | The paper describes experimental setup and results but does not specify any hardware details like GPU/CPU models or other computing resources. |
| Software Dependencies | No | The paper describes experimental setup but does not mention specific software names with version numbers (e.g., programming languages, libraries, or frameworks). |
| Experiment Setup | Yes | Our experimental setup is as follows. We used MNIST data set and two-layer networks with Re LU activation function. When training the network, we ensure that the absolute value of each parameter is smaller than 1 by weight-clipping after each gradient descent. Two experiments are considered: About size and accuracy: For networks with widths 100,200,. . . ,900,1000, we observe their accuracy on the test set after training. The results are shown in Figure 1. About data and precision: Using training sets with 10%, 20%, . . . , 90%, 100% of the original training set to train a network with widths 200, 400 and 600. The results are shown in Figure 2. |