Optimization for Neural Operators can Benefit from Width
Authors: Pedro Cisneros-Velarde, Bhavesh Shrimali, Arindam Banerjee
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present empirical results on canonical operator learning problems to support our theoretical results and find that larger widths benefit training. ... Finally, to complement our theoretical results, we present empirical evaluations of DONs and FNOs and show the benefits of width on learning three popular operators in the literature (Li et al., 2021a; Lu et al., 2021): antiderivative, diffusion-reaction, and Burgers equation. Our experiments show that increasing the width leads to lower training losses and generally leads to faster convergence. |
| Researcher Affiliation | Collaboration | 1VMware Research 2University of Illinois Urbana-Champaign. |
| Pseudocode | No | The paper describes the architectures of DONs and FNOs in text and through mathematical equations and schematics, but does not include any explicit pseudocode blocks or algorithms. |
| Open Source Code | Yes | The associated code for the experiments in Section 8 and the ones presented in this appendix are found in https://github.com/bhaveshshrimali/neuralop_optimization. |
| Open Datasets | Yes | We make use of the datasets publicly available at https://github.com/neuraloperator/neuraloperator, specifically the Burgers R10.mat dataset available at https://drive.google.com/drive/folders/1UnbQh2WWc6knEHbLn-ZaXrKUZhp7pjt-. |
| Dataset Splits | No | The paper mentions generating training data for the Antiderivative and Diffusion-Reaction operators, and a total sample size for the Burgers Equation dataset (2048 input functions). However, it does not provide specific train/test/validation splits (e.g., percentages or counts for each set) or reference standard predefined splits for reproducibility. For example, for Antiderivative: "sample size of the training data is n = 2000", and for Burgers: "comprises of 2048 input functions... We test the trained neural operators on a simple GRF sampled from the training dataset" which doesn't specify the split. |
| Hardware Specification | Yes | We remark that all experiments with widths m {10, 50} were run on a personal computer with one NVIDIA Quadro GPU, while the rest of widths were on Google Colab with single NVIDIA L4 and A100 GPUs. |
| Software Dependencies | No | The paper mentions using the Adam optimizer and Scaled Exponential Linear Unit (SELU) as activation functions but does not specify version numbers for any software libraries or frameworks (e.g., Python, PyTorch, TensorFlow, or specific Adam/SELU implementations). |
| Experiment Setup | Yes | We monitor the training process over 80,000 training epochs... We fix the learning rate for the Adam optimizer at 10-3 and with full-batch training, i.e., the batch size of 2000 for both DONs and FNOs. ... For all the experiments we use a constant learning rate of 3e-4 and Adam optimizer with a batch size of 4000. ... For all the experiments we use a constant learning rate of 10-3 and Adam optimizer with a batch size of 800. |