Nonlinearly Preconditioned Gradient Methods under Generalized Smoothness
Authors: Konstantinos Oikonomidis, Jan Quan, Emanuel Laude, Panagiotis Patrinos
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section we present some simple experiments that display the behavior of the proposed method on problems beyond traditional Lipschitzian assumptions. The code for reproducing the experiments is publicly available1. [...] Figure 2. Minimizing 1/4 x^4 using (2). [...] Figure 3. Nonconvex phase retrieval. [...] Figure 4. Simple NN training. |
| Researcher Affiliation | Collaboration | 1Department of Electrical Engineering (ESAT-STADIUS), KU Leuven, Kasteelpark Arenberg 10, 3001 Leuven, Belgium 2Leuven.AI-KU Leuven Institute for AI, 3000 Leuven, Belgium 3Proxima Fusion Gmb H, Floßergasse 2, 81369 Munich, Germany. Correspondence to: Konstantinos Oikonomidis <EMAIL>. |
| Pseudocode | No | The paper describes the main iteration in equation (2): xk+1 = Tγ,λ(xk) := xk γ ϕ (λ f(xk)), but does not provide a separate, structured pseudocode block for the algorithm in the main text. |
| Open Source Code | Yes | The code for reproducing the experiments is publicly available1. 1https://github.com/JanQ/nonlinearly-preconditioned-gradient |
| Open Datasets | Yes | In this experiment we consider training a simple fourlayer fully connected network with layer dimensions [28 28, 128, 64, 32, 32, 10] and Re LU activation functions on a subset of the MNIST dataset (Deng, 2012), using the cross-entropy loss. |
| Dataset Splits | No | The paper mentions using 'a subset (m = 600) of the dataset' for neural network training but does not specify how this subset is further divided into training, validation, or test splits. No explicit percentages, counts, or references to standard splits for reproduction are provided. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. |
| Software Dependencies | No | The paper does not explicitly mention any specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions) that would be needed to replicate the experiments. |
| Experiment Setup | Yes | For the isotropic case of (2) we take γ = 5/3 and λ = 1/100, while for the anisotropic one γ = 1/5 and λ = 1/14. [...] We compare the methods generated by ϕ1(x) = cosh( x ) 1, ϕ2(x) = x ln(1 x ) and the gradient clipping method (Zhang et al., 2020b), that can also be considered as an instance of (2) through Example 1.7, for various choices of the stepsizes and the clipping parameters. The results are presented in Figure 4. It can be seen that different combinations of γ and λ lead to different behaviors for the compared methods. |