Variational Neural Stochastic Differential Equations with Change Points
Authors: Yousef El-Laham, Zhongchang Sun, Haibei Zhu, Tucker Balch, Svitlana Vyetrenko
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this work, we explore modeling change points in time-series data using neural stochastic differential equations (neural SDEs). We propose a novel model formulation and training procedure based on the variational autoencoder (VAE) framework for modeling time-series as a neural SDE. Lastly, we present an empirical evaluation that demonstrates the expressive power of our proposed model, showing that it can effectively model both classical parametric SDEs and some real datasets with distribution shifts. We present numerical experiments to verify the validity of the proposed CP-SDEVAE model. To that end, we conduct two different types of experiments. First, we conduct experiments on synthetic data generated from an Ornstein-Uhlenbeck (OU) process. We also use this dataset as a means to conduct basic ablations to understand the effect of different hyperparameters and the impact of the proposed predictive negative log-likelihood regularizer. We summarize the results in Figure 4. A summary figure showing the results of the generated time-series from each model (along with the detected change point) is shown in Figure 5. The experimental results, presented in Table 1, demonstrate that CP-SDEVAE outperforms most baseline models, even without assuming any change points (L = 0) in real datasets. |
| Researcher Affiliation | Collaboration | Yousef El-Laham EMAIL J.P. Morgan AI Research Zhongchang Sun EMAIL University at Buffalo Haibei Zhu EMAIL J.P. Morgan AI Research Tucker Balch EMAIL J.P. Morgan AI Research Svitlana Vyetrenko EMAIL J.P. Morgan AI Research |
| Pseudocode | Yes | We present pseudocode for the training algorithm in Algorithm 1 and discuss each of the two steps in more details in the following. Algorithm 1 Variational Neural SDEs with Change Points (CP-SDEVAE) Algorithm 2 Maximum Likelihood CP Update Algorithm 3 Detection-based CP Update |
| Open Source Code | No | The implementation of the Latent SDE model utilized is based on an implementation found in the torchsde library at https://github.com/google-research/torchsde for modeling a Lorenz attractor. We adopted the implementation into our codebase and utilized an analogous architecture for fair comparison. The paper does not provide an explicit statement or link for the source code of the methodology described in this paper. |
| Open Datasets | Yes | We ran experiments on four datasets, including the S&P500 prices, S&P500 intraday prices, cryptocurrency prices, and air quality measurements. B.4 Air Quality Dataset: We also used the "Beijing Multi-Site Air-Quality Dataset", which is available on Kaggle 4 (Zhang et al., 2017). |
| Dataset Splits | Yes | To evaluate the performance of the models, each dataset was split into two subsets, a training and a testing set. This split was conducted following the "80-20" rule that 80% of the samples were randomly selected to form the training set, and the remaining 20% was the testing set. |
| Hardware Specification | No | The paper does not explicitly describe the hardware used to run its experiments, such as specific GPU or CPU models. |
| Software Dependencies | No | The implementation of the Latent SDE model utilized is based on an implementation found in the torchsde library at https://github.com/google-research/torchsde for modeling a Lorenz attractor. We adopted the implementation into our codebase and utilized an analogous architecture for fair comparison. The authors mention the 'ruptures library in Python (Truong et al., 2020)', but no specific version number for this or the torchsde library is provided. |
| Experiment Setup | Yes | Optimizer and stochastic weight averaging: Our optimization strategy is carefully crafted to ensure robust model training. We utilize the Adam optimizer with a learning rate of 1 10 4 and a weight decay of 1 10 4. The training process continues for a maximum of E = 10000 epochs or until convergence is reached, as determined by the ELBO loss. Initialization of change points: The initialization of change points plays a crucial role in model performance. We explored two methods: random initialization and initialization based on mean shift using the ruptures library in Python (Truong et al., 2020). We assume the following hyperparameter settings: for the encoder architecture with a 2-layer fully-connected neural network with standard Re LU activation functions; the latent dimension of the SDE is assumed to be 32; for all latent drift/diffusion functions, we use 2-layer fully-connected neural network with Lip Swish activations; for the decoder network, we use a 1-layer fully-connected network with Re LU activations; we use the Adam optimizer with a weight decay of 1 10 4 and J = 5 trajectories for each MC estimator of the ELBO. |