Approximation to Smooth Functions by Low-Rank Swish Networks
Authors: Zimeng Li, Hongjun Li, Jingyuan Wang, Ke Tang
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we establish the theoretical foundation for low-rank compression from the perspective of universal approximation theory. Specifically, we prove that for any H older function, there exists a Swish network with narrow linear hidden layers sandwiched between adjacent nonlinear layers, which can approximate the H older function within a given small error. ... Extensive experiments have confirmed the reliability of our theoretical result. ... Table 1. Cross-validation results for classical feedforward networks and low-rank networks on various classification (top) and regression (bottom) datasets. |
| Researcher Affiliation | Academia | 1School of Computer Science and Engineering, Beihang University, Beijing, China 2Institute of Economics (School of Social Sciences), Tsinghua University, Beijing, China 3School of Economics and Management, Beihang University, Beijing, China 4Engineering Research Center of Advanced Computer Application Technology, Ministry of Education, China. Correspondence to: Hongjun Li <EMAIL>, Jingyuan Wang <EMAIL>. |
| Pseudocode | No | The paper includes 'Proof Ideas' and 'Technical Proofs' sections which describe steps and mathematical derivations. However, these are presented as narrative text, lemmas, and propositions rather than structured pseudocode blocks or algorithms with clear labels like 'Algorithm 1' or 'Pseudocode'. |
| Open Source Code | No | The paper does not contain any explicit statements about releasing source code, nor does it provide links to a code repository in the main text or supplementary materials. The 'Impact Statement' section does not mention code availability. |
| Open Datasets | Yes | We choose eight popular UCI datasets, four of which are used for classification tasks and four for regression tasks. For each dataset, we convert each category feature to several dummy features, then scale all features to [0, 1]. For regression datasets, we also scale the targets to [0, 1]. Table 2 in Appendix B records the basic information for these datasets. |
| Dataset Splits | Yes | Then, for each dataset, we employ grid search with 10-fold cross-validation to identify the optimal depth and width for the classical feedforward Swish network. ... Subsequently, we conduct 10-fold cross-validation to evaluate the classical feedforward Swish network of the optimal depth and width and the low-rank Swish network... |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments. It describes the experimental methodology and dataset usage but makes no mention of CPU, GPU models, or other computing infrastructure. |
| Software Dependencies | No | The paper mentions experimental procedures such as 'grid search' and 'dependent t-tests' but does not specify any software libraries, frameworks (e.g., PyTorch, TensorFlow), or their version numbers that were used for implementation or analysis. |
| Experiment Setup | Yes | Then, for each dataset, we employ grid search with 10-fold cross-validation to identify the optimal depth and width for the classical feedforward Swish network. The candidate set for the depth consists of {2, 3, 4}, and for the width, it is {4d, 5d, 6d}, where d represents the input dimension. |