Global Convergence in Neural ODEs: Impact of Activation Functions

Authors: Tianxiang Gao, Siyuan Sun, Hailiang Liu, Hongyang Gao

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our theoretical findings are validated by numerical experiments, which not only support our analysis but also provide practical guidelines for scaling Neural ODEs, potentially leading to faster training and improved performance in real-world applications.
Researcher Affiliation Academia Tianxiang Gao De Paul University EMAIL Siyuan Sun Iowa State University EMAIL Hailiang Liu Iowa State University EMAIL Hongyang Gao Iowa State University EMAIL
Pseudocode Yes Algorithm 1 Res Net f L θ Forward Computation on Input x and Algorithm 2 Res Net f L θ Forward and Backward Computation on Input x
Open Source Code No The paper does not provide an explicit statement about releasing code or a link to a source code repository for the methodology described.
Open Datasets Yes Both the Neural ODE and Res Net were initialized with the same random weights and evaluated on the MNIST dataset, with Res Net depths L ranging from 10 to 1,000. We used Softplus activation to ensure smoothness. Additionally, we also include convergence analysis on diverse datasets, such as CIFAR-10, AG News, and Daily Climate, as well as additional activations like GELU, further demonstrating the generalizability of our findings.
Dataset Splits No The paper mentions using 'MNIST dataset', 'CIFAR-10', 'AG News', and 'Daily Climate' datasets, and refers to 'training set' and 'test loss'. For instance, in Section 6, it states 'the number of training samples (i.e., which is 1000 in our experiments)' and 'By sampling 500 examples from the MNIST training set' in Section H.2. However, it does not provide specific percentages or absolute sample counts for training, validation, and test splits, nor does it explicitly reference standard splits with citations that define these details.
Hardware Specification No The paper does not provide specific details about the hardware used for running the experiments, such as GPU models, CPU types, or cloud computing specifications.
Software Dependencies No The paper mentions ODE solvers like 'Euler, rk4, and dopri5' in Section H.7, but it does not specify their version numbers or any other software dependencies with their versions.
Experiment Setup Yes We evaluated Neural ODE models with increasing widths, ranging from 10 to 1,000, and computed the NTK for each width. We monitored both the NTK s smallest eigenvalue and the distance of the model parameters from their initial values over 100 epochs. Softplus was used as the activation function to ensure smoothness and non-polynomial nonlinearity. Additionally, in Section H.5, it states 'The optimizer used was gradient descent with a learning rate of 0.1, and models were trained for 100 epochs' for experiments on diverse datasets with 'different widths (i.e., 500, 1000, 2000, 3000)'.