DRM Revisited: A Complete Error Analysis
Authors: Yuling Jiao, Ruoxuan Li, Peiying Wu, Jerry Zhijian Yang, Pingwen Zhang
JMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this paper, we address this gap by providing a comprehensive error analysis of the Deep Ritz Method (DRM). Specifically, we investigate a foundational question in the theoretical analysis of DRM under the overparameterized regime: given a target precision level, how can one determine the appropriate number of training samples, the key architectural parameters of the neural networks, the step size for the projected gradient descent optimization procedure, and the requisite number of iterations, such that the output of the gradient descent process closely approximates the true solution of the underlying partial differential equation to the specified precision? Keywords: deep Ritz method, projected gradient descent, over-parameterization, complete error analysis, new optimization error analysis |
| Researcher Affiliation | Academia | Yuling Jiao EMAIL School of Artificial Intelligence, Wuhan University, Wuhan, China National Center for Applied Mathematics in Hubei, Wuhan University, Wuhan, China Hubei Key Laboratory of Computational Science, Wuhan University, Wuhan, China Ruoxuan Li EMAIL School of Mathematics and Statistics, Wuhan University, Wuhan, China Peiying Wu EMAIL School of Mathematics and Statistics, Wuhan University, Wuhan, China Jerry Zhijian Yang EMAIL School of Mathematics and Statistics, Wuhan University, Wuhan, China Wuhan Institute for Math & AI, Wuhan University, Wuhan, China National Center for Applied Mathematics in Hubei, Wuhan University, Wuhan, China Hubei Key Laboratory of Computational Science, Wuhan University, Wuhan, China Pingwen Zhang EMAIL Wuhan Institute for Math & AI, Wuhan University, Wuhan, China School of Mathematical Sciences, Peking University, Beijing, China |
| Pseudocode | Yes | 2.4 Projected Gradient Descent We use the PGD algorithm to minimize b L(um,θ) in Equation 7, which is an iterative optimization method suitable for constrained optimization problems. Since the Monte Carlo samples {Xp}Nin p=1, {Yq}Nb q=1 are fixed during the optimization process, b L(um,θ) becomes a function solely dependent on the weights θm total, and we denote it as b F(θm total) = b F(θm in, θm out). The PGD algorithm consists of the following three steps: Initialization. We start with an initial guess (θm total)[0] = (θm in, θm out)[0] as follows: (i) For the linear coefficients θm out, set (θm out)[0] = 0 , i.e., (ck)[0] = 0 (k = 1, . . . , m) . (ii) For the sub-network parameters θm in, initialize each element in (θm in)[0] to follow the same uniform distribution U[ B, B] independently, that is, a (ℓ) k,i,j [0] i.i.d. U[ B, B] , b (ℓ) k,i [0] i.i.d. U[ B, B] . Constraint Set. Then, we choose η, ζ > 0, and determine the constraint set as follows: (i) Let Aη be the (random) set of all weight vectors θm in which satisfy θm in (θm in)[0] 2 η . (ii) Let Bζ be the set of all weight vectors θm out which satisfy k=1 |ck| ζ . Iterative Update. Finally, let T N+, λ > 0. For each iteration t = 0, . . . , T 1, do (i) Compute the gradient of the objective function at the current point: g[t] = θm total b F θm,[t] in , θm,[t] out . (ii) Update the weight vector by first performing a gradient descent step with step size λ and then projecting the result onto the feasible set: θm in, θm out [t+1] = Proj Aη Bζ (θm in, θm out)[t] λ g[t] , |
| Open Source Code | No | The paper does not provide concrete access to source code. There are no links to repositories, explicit statements about code release, or mentions of code in supplementary materials. |
| Open Datasets | No | The paper uses the Monte Carlo method to discretize the integral type functional L, generating samples {Xp}Nin p=1 i.i.d. U(Ω) and {Yq}Nb q=1 i.i.d. U(∂Ω). It does not use or provide concrete access information for any publicly available or open external datasets. |
| Dataset Splits | No | The paper uses Monte Carlo samples generated from uniform distributions U(Ω) and U(∂Ω). It mentions Nin = Nb = Ns for these samples but does not provide specific train/test/validation splits for any external dataset, as it's a theoretical analysis generating its own data points. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU models, CPU models, or cloud computing specifications used for running experiments. The paper is purely theoretical and does not report on experimental results requiring hardware. |
| Software Dependencies | No | The paper does not provide any specific software dependencies with version numbers. The paper focuses on theoretical analysis and does not describe a practical implementation with specific software components. |
| Experiment Setup | No | The paper is a theoretical work providing error analysis and mathematical bounds for the Deep Ritz Method. It defines theoretical parameters and conditions for its main theorem (e.g., m, W, L, B, η, ζ, T, λ, Ns in terms of epsilon, mu, etc.), but these are part of the mathematical proof and not specific experimental setup details like concrete hyperparameter values or training configurations for a practical implementation. |