SGD Jittering: A Training Strategy for Robust and Accurate Model-Based Architectures

Authors: Peimeng Guan, Mark A. Davenport

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validate SGD jittering using denoising toy examples, seismic deconvolution, and single-coil MRI reconstruction. Both SGD jittering and its SPGD extension yield cleaner reconstructions for out-of-distribution data and demonstrates enhanced robustness against adversarial attacks.
Researcher Affiliation Academia 1Department of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, USA.
Pseudocode No The paper describes iterative processes and algorithmic steps mathematically (e.g., Equation 4 for GD, Equation 8 for SGD jittering) but does not include a dedicated section or figure explicitly labeled 'Pseudocode' or 'Algorithm', nor are the steps presented in a structured, code-like format.
Open Source Code Yes The code for the proposed SGD and SPGD jittering methods are provided here: https://github.com/ Inv Probs/SGD-jittering.
Open Datasets Yes We train models with single-coil knee MRI from the fast MRI dataset (Knoll et al., 2020) with 4 acceleration, or 1/4 of the measurements in k-space is used for reconstruction. To assess generalization, we use a different knee dataset from Bickle & Jin (2021), which includes giant tumor cells absent from the training data.
Dataset Splits Yes We use 200 samples for training and 50 for evaluation.
Hardware Specification Yes All models are trained using Nvidia RTX3080, using Adam optimizer with a learning rate of 1e 4.
Software Dependencies No The paper mentions the 'Adam optimizer' but does not specify the software framework (e.g., PyTorch, TensorFlow) or its version, nor other key libraries with version numbers required for reproducibility.
Experiment Setup Yes We use 10-iteration gradient descent loop unrolling architectures for all tasks. For the toy example, we use a 3-layer MLP with a hidden dimension of 32 as the learned gradient network. For seismic deconvolution and MRI reconstruction, a 5-layer and 8-layer Dn CNN with 64 hidden channels are used for the learned gradient network, respectively. All models are trained using Nvidia RTX3080, using Adam optimizer with a learning rate of 1e 4. The jittering noise variance for each method is selected based on performances of both robustness and accuracy. In input jittering training, the three methods in order of 2D denoising, seismic deconvolution and MRI reconstruction use the noise variance of 0.01, 0.05 and 0.05. In SGD jittering training, we choose the SGD jittering variance for each task with 0.01, 0.1 and 0.01 respectively. Training batch sizes for the 2D denoising problem, seismic deconvolution and MRI are 256, 16, and 4 respectively.