SGD Jittering: A Training Strategy for Robust and Accurate Model-Based Architectures
Authors: Peimeng Guan, Mark A. Davenport
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate SGD jittering using denoising toy examples, seismic deconvolution, and single-coil MRI reconstruction. Both SGD jittering and its SPGD extension yield cleaner reconstructions for out-of-distribution data and demonstrates enhanced robustness against adversarial attacks. |
| Researcher Affiliation | Academia | 1Department of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, USA. |
| Pseudocode | No | The paper describes iterative processes and algorithmic steps mathematically (e.g., Equation 4 for GD, Equation 8 for SGD jittering) but does not include a dedicated section or figure explicitly labeled 'Pseudocode' or 'Algorithm', nor are the steps presented in a structured, code-like format. |
| Open Source Code | Yes | The code for the proposed SGD and SPGD jittering methods are provided here: https://github.com/ Inv Probs/SGD-jittering. |
| Open Datasets | Yes | We train models with single-coil knee MRI from the fast MRI dataset (Knoll et al., 2020) with 4 acceleration, or 1/4 of the measurements in k-space is used for reconstruction. To assess generalization, we use a different knee dataset from Bickle & Jin (2021), which includes giant tumor cells absent from the training data. |
| Dataset Splits | Yes | We use 200 samples for training and 50 for evaluation. |
| Hardware Specification | Yes | All models are trained using Nvidia RTX3080, using Adam optimizer with a learning rate of 1e 4. |
| Software Dependencies | No | The paper mentions the 'Adam optimizer' but does not specify the software framework (e.g., PyTorch, TensorFlow) or its version, nor other key libraries with version numbers required for reproducibility. |
| Experiment Setup | Yes | We use 10-iteration gradient descent loop unrolling architectures for all tasks. For the toy example, we use a 3-layer MLP with a hidden dimension of 32 as the learned gradient network. For seismic deconvolution and MRI reconstruction, a 5-layer and 8-layer Dn CNN with 64 hidden channels are used for the learned gradient network, respectively. All models are trained using Nvidia RTX3080, using Adam optimizer with a learning rate of 1e 4. The jittering noise variance for each method is selected based on performances of both robustness and accuracy. In input jittering training, the three methods in order of 2D denoising, seismic deconvolution and MRI reconstruction use the noise variance of 0.01, 0.05 and 0.05. In SGD jittering training, we choose the SGD jittering variance for each task with 0.01, 0.1 and 0.01 respectively. Training batch sizes for the 2D denoising problem, seismic deconvolution and MRI are 256, 16, and 4 respectively. |