Accelerating Neural ODEs: A Variational Formulation-based Approach
Authors: Hongjue Zhao, Yuchen Wang, Hairong Qi, Zijie Huang, Han Zhao, Lui Sha, Huajie Shao
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that our approach accelerates NODE training by 10 to 1000 times compared to existing NODE-based methods, while achieving higher or comparable accuracy in dynamical systems. |
| Researcher Affiliation | Academia | 1University of Illinois Urbana-Champaign, 2William & Mary, 3University of Tennessee, Knoxville, 4University of California Los Angeles EMAIL, EMAIL EMAIL, EMAIL |
| Pseudocode | Yes | The detailed algorithm is presented in Algorithm 1. First, we employ natural cubic spline regression to construct an analytical approximation of trajectories from noisy and partially observed data x, represented as the coefficients of the spline. Next, we estimate the values in trajectories based on the spline to remove noise and fill in missing values. Then, we compute the vector fields f based on estimated trajectories and utilize natural cubic spline regression again to build an analytical approximation of f. Finally, using Filon s method, we compute oscillatory integrals based on the analytical approximations of x and f. Algorithm 1: VF Loss Data: A trajectory {(tk, x(tk))}K k=0, where 0 = t0 < t1 < < t K = T, and x(tk) = [x1(tk), x2(tk), . . . , xd(tk)] (R { })d Input: Smoothing coefficient λ [0, 1], the number of basis functions L, and neural network fθ( , x( )) Output: The VF loss /*Perform spline regression on x to get spline coefficients */ ak,m = Spline Regression(λ, {(tk, x(tk))}K k=0) Rd, k = 0, . . . , K 1, m = 0, 1, 2, 3 /*Make estimations of the trajectory x */ for k = 0 : K do if k < K then ˆx(tk) = ak,0 + P3 m=1 ak,m(tk tk)m = ak,0 else ˆx(t K) = P3 m=0 a K 1,m(t K t K 1)m /*Evaluate the vector fields f */ f(tk) = fθ(tk, ˆx(tk)), k = 0, . . . , K /*Perform spline interpolation on f to get spline coefficients */ bk,m = Spline Interp({(tk, f(tk))}K k=0) Rd, k = 0, . . . , K 1, m = 0, 1, 2, 3 /*Compute oscillatory integrals */ for ℓ= 1 : L do R T 0 ˆx(t) ϕℓ(t) dt = q T PK 1 k=0 P3 m=0 ak,m R tk+1 tk (t tk)m cos πℓt T dt R T 0 fθ(t, ˆx(t))ϕℓ(t) dt q 2 T PK 1 k=0 P3 m=0 bk,m R tk+1 tk (t tk)m sin πℓt c(x, f, ϕℓ) = R T 0 fθ(t, ˆx(t))ϕℓ(t) dt + R T 0 ˆx(t) ϕℓ(t) dt return PL ℓ=1 c(x, f, ϕℓ) 2 2 |
| Open Source Code | Yes | The code is available at https://github.com/Zhao Hongjue/VF-NODE-ICLR2025. |
| Open Datasets | Yes | We leverage data from the COVID-19 Data Hub (Guidotti & Ardia, 2020). [...] We select four dynamical systems from fields such as biology, biochemistry, genetics and epidemiology: the glycolytic oscillator (Sel Kov, 1968), the genetic toggle switch (Gardner et al., 2000), the repressilator (Elowitz & Leibler, 2000), and the age-structured SIR model (Ram & Schaposnik, 2021). |
| Dataset Splits | Yes | For each system, except the age-structured SIR model, we generate 125 trajectories with randomly sampled initial conditions, of which 100 are used for training and 25 for validation. [...] Additionally, we generate 25 test trajectories, each containing 200 randomly sampled points. [...] All settings remain the same for the age-structured SIR model except for the number of trajectories: 500 are generated for training and validation (400 for training and 100 for validation), and an additional 100 trajectories are generated for testing. |
| Hardware Specification | Yes | All the experiments are implemented on the same server, equipped with 4 A5000 GPUs with 24GB graphics memory. |
| Software Dependencies | No | All experiments in this work are implemented using jax (Bradbury et al., 2018). Specifically, the implementation of neural differential equation models is based on equinox (Kidger & Garcia, 2021) and diffrax (Kidger, 2022). To optimize models, we use the optax (Deep Mind et al., 2020). |
| Experiment Setup | Yes | Detailed hyperparameter settings of these models are provided in Appendix I.1. [...] For all tasks, we employed the Adam optimizer (Kingma & Ba, 2014), loading all training data in a single epoch. [...] For all tasks except modeling the temporal effect of chemotherapy on tumor volume, we set the number of training epochs to 5,000; for the tumor volume modeling task, we used 300 epochs. For all tasks except the simulation of the age-structured SIR model, we utilized the cosine onecycle schedule from optax as the learning rate scheduler, with an initial learning rate of 0.001. The scheduler parameters were set as follows: transition steps equal to the number of epochs, peak value at 0.01, pct start at 0.2, div factor at 100, and final div factor at 1,000. For the simulation of the age-structured SIR model, we employed the cosine decay schedule as the learning rate scheduler, also with an initial learning rate of 0.001. The parameters were set as follows: decay steps equal to the number of epochs, alpha at 0.01, and exponent at 1.0. |