Neural Q-learning for solving PDEs
Authors: Samuel N. Cohen, Deqing Jiang, Justin Sirignano
JMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Numerical results are presented in Section 6. 6. Numerical experiments In this section, we present numerical results where we apply our algorithm to solve a family of partial differential equations. The approximator matches the solution of the differential equation closely in a relatively short period of training time in these test cases. 6.2.1 1-dimensional case For dimension n = 1, domain Ω= ( 1, 1). The exact solution of this differential equation is u(x) = 1 γ + c1(e 2γx + e 2γx) where c1 = 1/(γ(e 2γ + e 2γ)). For this subsection, set γ = 0.1. To keep track of the training progress, we monitor the average loss level at time t, that is et R Ω[LQt]2dµ where µ is the Lebesgue measure on Ω, which can be estimated using our sample of evaluation points at each time. |
| Researcher Affiliation | Academia | Samuel N. Cohen EMAIL Mathematical Institute, University of Oxford Oxford, OX2 6GG, UK Deqing Jiang EMAIL Mathematical Institute, University of Oxford Oxford, OX2 6GG, UK Justin Sirignano EMAIL Mathematical Institute, University of Oxford Oxford, OX2 6GG, UK |
| Pseudocode | Yes | Algorithm 1: Q-PDE Algorithm Parameters: Hyper-parameters of the single-layer neural network; Domain Ω; PDE operator L; Boundary condition at Ω; Sampling measure µ; Number of Monte Carlo points M; Upper bound of training time T; Initialise: Neural net SN; Auxiliary function η; Approximator QN based on SN and η; Smoothing function ψN; Learning rate scheduler {αN t }t 0; Stopping criteria ϵ; Current time t = 0. while err ϵ and t T do Sample M points in Ωusing µ, {xi}; Compute biased gradient estimator GN M,t using (112); Update neural network parameters via (111); Compute err = 1 M PM i=1 ψN(LQN t (xi))2; Update time t; end Return approximator QN t . |
| Open Source Code | Yes | The implementation is available at https://github.com/DeqingJ/QPDE. |
| Open Datasets | No | Section 6.2 discusses a "Test equation: Survival time of a Brownian motion" which has an explicit solution. Algorithm 1, under the step "Sample M points in Ωusing µ, {xi}", indicates that the data used for training is generated rather than being a pre-existing, publicly available dataset. There is no mention of external datasets or links to any data repositories. |
| Dataset Splits | No | The paper describes sampling M points from the domain for training (Algorithm 1: "Sample M points in Ωusing µ, {xi}"). However, it does not specify any explicit training, validation, or test dataset splits or percentages for these sampled points. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments, such as CPU or GPU models, or memory specifications. It only mentions the use of "Py Torch" for implementation. |
| Software Dependencies | No | The paper mentions using "Py Torch" and the "ADAM adaptive gradient descent rule" (optimizer) but does not provide specific version numbers for either of these software components. |
| Experiment Setup | Yes | A.8.1 Table of hyper parameters Dimension Method Layer Units Activation Optimizer Numer of MC samples 1 Q-PDE 1 64 Sigmoid ADAM l MC=1k, u MC=2k 20 Q-PDE 1 256 Sigmoid ADAM l MC=2k, u MC=10k 20 DGM 1 256 Sigmoid ADAM l MC=2k, u MC=10k A.8.2 Initialization of neural networks Parameters of the single-layer net S0 are randomly sampled: ci 0 are i.i.d sampled from uniform distribution U[ 1, 1]; wi 0 and bi 0 are i.i.d sampled from Gaussian distribution N(0, Id) and N(0, 1) where Id is the identity matrix of dimension d. A.8.3 Learning process We use Qt := St η as the approximator, where η(x) := 1 x 2. We apply the built-in ADAM optimizer in our test with initial learning rate l0 = 0.5, and the learning rate decays as lt = l0/(1+t/200). In each step, we sample MC samples for gradient estimate. As a larger number of MC sample points reduces the random error, we linearly increase the number Mt of MC points to be sampled at each step as Mt = round(l MC + (u MC l MC)t/T), where T is the terminal number of training steps. |