Approximation to Smooth Functions by Low-Rank Swish Networks

Authors: Zimeng Li, Hongjun Li, Jingyuan Wang, Ke Tang

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this paper, we establish the theoretical foundation for low-rank compression from the perspective of universal approximation theory. Specifically, we prove that for any H older function, there exists a Swish network with narrow linear hidden layers sandwiched between adjacent nonlinear layers, which can approximate the H older function within a given small error. ... Extensive experiments have confirmed the reliability of our theoretical result. ... Table 1. Cross-validation results for classical feedforward networks and low-rank networks on various classification (top) and regression (bottom) datasets.
Researcher Affiliation Academia 1School of Computer Science and Engineering, Beihang University, Beijing, China 2Institute of Economics (School of Social Sciences), Tsinghua University, Beijing, China 3School of Economics and Management, Beihang University, Beijing, China 4Engineering Research Center of Advanced Computer Application Technology, Ministry of Education, China. Correspondence to: Hongjun Li <EMAIL>, Jingyuan Wang <EMAIL>.
Pseudocode No The paper includes 'Proof Ideas' and 'Technical Proofs' sections which describe steps and mathematical derivations. However, these are presented as narrative text, lemmas, and propositions rather than structured pseudocode blocks or algorithms with clear labels like 'Algorithm 1' or 'Pseudocode'.
Open Source Code No The paper does not contain any explicit statements about releasing source code, nor does it provide links to a code repository in the main text or supplementary materials. The 'Impact Statement' section does not mention code availability.
Open Datasets Yes We choose eight popular UCI datasets, four of which are used for classification tasks and four for regression tasks. For each dataset, we convert each category feature to several dummy features, then scale all features to [0, 1]. For regression datasets, we also scale the targets to [0, 1]. Table 2 in Appendix B records the basic information for these datasets.
Dataset Splits Yes Then, for each dataset, we employ grid search with 10-fold cross-validation to identify the optimal depth and width for the classical feedforward Swish network. ... Subsequently, we conduct 10-fold cross-validation to evaluate the classical feedforward Swish network of the optimal depth and width and the low-rank Swish network...
Hardware Specification No The paper does not provide specific details about the hardware used for running the experiments. It describes the experimental methodology and dataset usage but makes no mention of CPU, GPU models, or other computing infrastructure.
Software Dependencies No The paper mentions experimental procedures such as 'grid search' and 'dependent t-tests' but does not specify any software libraries, frameworks (e.g., PyTorch, TensorFlow), or their version numbers that were used for implementation or analysis.
Experiment Setup Yes Then, for each dataset, we employ grid search with 10-fold cross-validation to identify the optimal depth and width for the classical feedforward Swish network. The candidate set for the depth consists of {2, 3, 4}, and for the width, it is {4d, 5d, 6d}, where d represents the input dimension.