reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Nonconvex Stochastic Bregman Proximal Gradient Method with Application to Deep Learning

Authors: Kuangyu Ding, Jingyang Li, Kim-Chuan Toh

JMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results on quadratic inverse problems demonstrate SBPG s robustness in terms of stepsize selection and sensitivity to the initial point. Furthermore, we introduce a momentum-based variant, MSBPG, which enhances convergence by relaxing the mini-batch size requirement while preserving the optimal oracle complexity. We apply MSBPG to the training of deep neural networks, utilizing a polynomial kernel function to ensure smooth adaptivity of the loss function. Experimental results on benchmark datasets conﬁrm the eﬀectiveness and robustness of MSBPG in training neural networks.
Researcher Affiliation	Academia	Kuangyu Ding EMAIL Department of Mathematics National University of Singapore 10 Lower Kent Ridge Road, Singapore 119076 Jingyang Li li EMAIL Department of Mathematics National University of Singapore 10 Lower Kent Ridge Road, Singapore 119076 Kim-Chuan Toh EMAIL Department of Mathematics Institute of Operations Research and Analytics National University of Singapore 10 Lower Kent Ridge Road, Singapore 119076
Pseudocode	Yes	Details of the implementation are provided in Algorithm 1. Algorithm 1 Momentum based Stochastic Bregman Proximal Gradient (MSBPG) for training neural networks
Open Source Code	No	The paper makes a statement about the promise of MSBPG as a "universal open-source optimizer for future applications" but does not explicitly state that the code for the current work is being released or provide a link.
Open Datasets	Yes	We conducted experiments on several representative benchmarks, including VGG16 (Simonyan and Zisserman, 2014), Res Net34 (He et al., 2016) on CIFAR10 dataset (Krizhevsky et al., 2009), Res Net34 (He et al., 2016), Dense Net121 (Huang et al., 2017) on CIFAR100 dataset (Krizhevsky et al., 2009), and LSTMs (Hochreiter and Schmidhuber, 1997) on the Penn Treebank dataset (Marcinkiewicz, 1994).
Dataset Splits	Yes	We used the default training hyperparameters of SGD, Adam, and Adam W in these settings (He et al., 2016; Zhuang et al., 2020; Chen et al., 2021), and set MSBPG s learning rate (initial stepsize) as 0.1, momentum coeﬃcient β as 0.9, weight decay coeﬃcient λ2 as 1 10 3. ... We followed the standard experimental setup for training LSTMs (Zhuang et al., 2020; Chen et al., 2021)...
Hardware Specification	Yes	The experiments for the quadratic inverse problem are conducted using MATLAB R2021b on a Windows workstation equipped with a 12-core Intel Xeon E5-2680 @ 2.50GHz processor and 128GB of RAM. For the deep learning experiments, we conducted the experiments using Py Torch running on a single RTX3090 GPU.
Software Dependencies	Yes	The experiments for the quadratic inverse problem are conducted using MATLAB R2021b on a Windows workstation... For the deep learning experiments, we conducted the experiments using Py Torch running on a single RTX3090 GPU.
Experiment Setup	Yes	For our experiments, we utilized two common training strategies: reducing the stepsize to 10% of its original value near the end of training (Zhuang et al., 2020; Chen et al., 2021; Luo et al., 2019), and using a cosine annealing schedule for stepsizes (Loshchilov and Hutter, 2016, 2017). ... For MSBPG, we set the learning rate to 25, 80, and 80 for 1-, 2-, and 3-layer LSTMs, respectively, with a momentum parameter β = 0.9, weight decay coeﬃcient λ2 = 2 10 6. For the layerwise kernel function φi(Wi) = 1 r Wi r, we set r = 4 and δ = 1 10 6.