Second-Order Min-Max Optimization with Lazy Hessians

Authors: Lesi Chen, Chengchang Liu, Jingzhao Zhang

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 5 NUMERICAL EXPERIMENTS. We conduct our algorithms on a regularized bilinear min-max problem and fairness-aware machine learning tasks. We include EG (Korpelevich, 1976) and NPE (Monteiro & Svaiter, 2012; Bullins & Lai, 2022; Adil et al., 2022; Lin et al., 2022) (which is our algorithm with m = 1) as baselines, since they are the optimal firstand second-order methods for convex-concave minimax problems, respectively. We run the programs on an AMD EPYC 7H12 64-Core Processor. Figures 1 and 2 present the results of the running time against F(z).
Researcher Affiliation Academia Lesi Chen IIIS, Tsinghua University & Shanghai Qizhi Institute EMAIL; Chengchang Liu The Chinese University of Hong Kong EMAIL; Jingzhao Zhang IIIS, Tsinghua University & Shanghai Qizhi Institute & Shanghai AI Lab EMAIL. All listed institutions (Tsinghua University, Shanghai Qizhi Institute, The Chinese University of Hong Kong, Shanghai AI Lab) are academic institutions or academic-affiliated research institutes.
Pseudocode Yes Algorithm 1 LEN(z0, T, m, M); Algorithm 2 LEN-restart(z0, T, m, M, S); Algorithm 3 Implementation of LEN (z0, T, m, M).
Open Source Code Yes 1The source codes are available at https://github.com/TrueNobility303/LEN.
Open Datasets Yes We conduct our algorithms on a regularized bilinear min-max problem considered in the literature (Lin et al., 2022; Jiang et al., 2024)... We conduct the experiments on datasets heart (n = 270, dx = 13) (Chang & Lin, 2011), adult (n = 32, 561, dx = 123) (Chang & Lin, 2011) and law school (n = 20, 798, dx = 380) (Le Quy et al., 2022; Liu et al., 2022).
Dataset Splits No For the bilinear min-max problem, the paper mentions generating data: 'we generate each element in b as independent Rademacher variables in {-1, +1}'. For fairness-aware machine learning, it mentions '{ai, bi, ci}n i=1 be the training set' but does not specify any explicit training/validation/test splits or their percentages for the datasets used.
Hardware Specification Yes We run the programs on an AMD EPYC 7H12 64-Core Processor.
Software Dependencies No The paper does not provide specific ancillary software details with version numbers (e.g., programming languages, libraries, or solvers).
Experiment Setup Yes For EG, we tune the stepsize in {1, 0.1, 0.01, 0.001}. For LEN, we vary m in {1, 2, 10, 100}. We set λ = γ = 10^-4 and β = 0.5. For EG, we tune the stepsize in {0.1, 0.01, 0.001}. For second-order methods (NPE and LEN), as we do not know the value of ρ in advance, we view it as a hyperparameter and tune it in {1, 10, 100}. We set m = 10 for LEN.