A Continuous-time Stochastic Gradient Descent Method for Continuous Data

Authors: Kexin Jin, Jonas Latz, Chenguang Liu, Carola-Bibiane Schönlieb

JMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In numerical experiments, we show the suitability of our stochastic gradient process for (convex) polynomial regression with continuous data and the (non-convex) training of physics-informed neural networks with continuous sampling of function-valued data. We end with illustrating the applicability of the stochastic gradient process in a polynomial regression problem with noisy functional data, as well as in a physics-informed neural network. We now study two fields of application of the stochastic gradient process for continous data. In the first example, we consider regularized polynomial regression with noisy functional data. In the second example, we study so-called physics-informed neural networks.
Researcher Affiliation Academia Kexin Jin EMAIL Department of Mathematics Princeton University Princeton, NJ 08544-1000, USA Jonas Latz EMAIL Department of Mathematics The University of Manchester Manchester, M13 9PL, United Kingdom Chenguang Liu EMAIL Delft Institute of Applied Mathematics Technische Universiteit Delft Delft, 2628 CD, The Netherlands Carola-Bibiane Sch onlieb EMAIL Department of Applied Mathematics and Theoretical Physics University of Cambridge Cambridge, CB3 0WA, United Kingdom
Pseudocode Yes Algorithm 1 Discretized Markov pure jump process Algorithm 2 Discretized Reflected Brownian motion on S
Open Source Code No The paper does not provide an explicit statement or link to their own source code for the methodology described. It mentions using existing packages and code from other works, but not their own implementation.
Open Datasets No The paper uses 'artificial data g' for polynomial regression and defines a '1D Transport equation' with a known analytical solution, indicating synthetic data generation rather than the use of pre-existing public datasets. No concrete access information for any dataset is provided.
Dataset Splits Yes From the interior of the domain of time and space variables, i.e. (0, 1) (0, 1), we use Algorithm 2 with σ = 0.5 to sample the train set of size 3 104 for SGPC and SGPD and we uniformly sample 600 points for the train set of SGD. In addition, as a part of the train set for all three methods, we sample uniformly 20 and 60 points for the initial condition and periodic boundary condition, respectively. The learning rate for SGD and SGPC is 0.01. ... We evaluate the models by testing on a uniformly sampled test set of size 2 103 and compare the predicted values with the theoretical solution u(t, x) = sin(2π(x t)).
Hardware Specification Yes We the train networks on Google Colab Pro using GPUs (often T4 and P100, sometimes K80).
Software Dependencies No Integrated Py Torch-based packages are available for example see Chen et al. (2020); Pedro et al. (2019). The paper mentions 'PyTorch' but does not specify a version number for it or any other software used in their own implementation.
Experiment Setup Yes In our experiments, we choose h = 0.1. We use Algorithms 1 and 2 to discretize the index processes with constant stepsize t( ) t( 1) = 10 2. We perform J := 100 repeated runs for each of the considered settings for N := 5 104 time steps and thus, obtain a family of trajectories (θ(j,n))n=1,...,N,j=1,...,J. In each case, we choose the initial values V (0) := 0 and the θ(j,0) := (0.5, . . . , 0.5). For our estimation, we set α := 10 4 and use the K = 9 Legendre polynomials with degrees 0, . . . , 8. The learning rate for SGD and SGPC is 0.01. The learning rate for SGPD is defined as η(t) = 0.01 log(t + 2)0.3 , which is chosen such that the associated µ := 1/η satisfies Assumption 4. For all three methods, we use Adam (see Kingma and Ba, 2015) as the optimizer to speed up the convergence; we use an L2 regularizer with weight 0.1 to avoid overfitting. Each model is trained over 600 iterations with batch size 50. The training process for SGPC and SGPD contains only one epoch, while we train 50 epochs in the SGD case.