Stochastic Interior-Point Methods for Smooth Conic Optimization with Applications

Authors: Chuan He, Zhanwang Deng

JMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We now conduct numerical experiments to evaluate the performance of our proposed SIPMs. For SIPM-ME, we consider two versions: SIPM-ME+ with increasing batch sizes and SIPM-ME1 with a fixed batch size. We compare our SIPMs against a deterministic variant of Algorithm 1 with full-batch gradients (IPM-FG) and other popular methods on robust linear regression (Section 4.1), multi-task relationship learning (Section 4.2), and clustering data streams (Section 4.3). We evaluate the approximate solutions found by our SIPMs using two measures: relative objective value: f(xk)/f(x0); and relative estimated stationary error: mk+AT λk xk/ m0+ AT λ0 x0.
Researcher Affiliation Academia Chuan He EMAIL Department of Mathematics Link oping University Link oping, SE-581 83, Sweden Zhanwang Deng EMAIL Academy for Advanced Interdisciplinary Studies Peking University Beijing, 100871, China
Pseudocode Yes In what follows, we propose an SIPM framework in Algorithm 1 for solving problem (3). Subsequently, we will employ four distinct stochastic estimators to construct {mk}k 0, with specific schemes provided in (16), (22), (29), and (38), respectively. We remark that computations involving 2B(xk) 1 can be performed efficiently for common nonlinear cones K. For example, when K is the second-order cone, B(xk) 1 can be computed analytically (e.g., see Alizadeh and Goldfarb (2003)), and (A 2B(xk) 1AT ) 1v for any v Rm can be efficiently evaluated using Cholesky factorization. When K is the semidefinite cone, we have that 2B(Xk) 1[V ] = Xk V Xk holds for any n n symmetric matrix V , and that A 2B(Xk) 1AT can be efficiently computed by exploiting the sparsity of A (see Toh et al. (1999) for details). Algorithm 1 An SIPM framework Input: starting point x0 Ω , nonincreasing step sizes {ηk}k 0 (0, sη] with sη (0, 1), nonincreasing barrier parameters {µk}k 0 (0, 1].
Open Source Code Yes All experiments are carried out using Matlab 2024b on a standard PC with 3.20 GHz AMD R7 5800H microprocessor with 16GB of memory. The code to reproduce our numerical results is available at https://github.com/ChuanH6/SIPM.
Open Datasets Yes We consider two typical regression datasets, namely, wine-quality and energy-efficiency , from the UCI repository.1 1. see archive.ics.uci.edu/datasets
Dataset Splits No The paper mentions simulating missing values by randomly deleting 25% of the features in 50% of the training samples, creating subsets for multi-task learning, and applying "drifts" to data streams. However, it does not explicitly provide standard training/validation/test splits with percentages, absolute counts, or references to predefined splits for model evaluation reproducibility.
Hardware Specification Yes All experiments are carried out using Matlab 2024b on a standard PC with 3.20 GHz AMD R7 5800H microprocessor with 16GB of memory.
Software Dependencies Yes All experiments are carried out using Matlab 2024b on a standard PC with 3.20 GHz AMD R7 5800H microprocessor with 16GB of memory.
Experiment Setup Yes For SIPMs, we set the barrier function as B(u, t) = ln(t2 u 2), associated with the second-order cone Qd+1 .= {(u, t) Rd R+ : u t}. For SIPM-ME1, SIPM-PM, SIPM-EM, SIPM-RM, and SALM-RM, we set the maximum number of epochs as 10, 000, and the batch size as 200 and 500 for the wine-quality and energy-efficiency datasets, respectively. For SIPM-ME+, we initialize the batch size as 1 and increase it by 1 per iteration. For a fair comparison, we set the maximum number of iterations for SIPM-ME+ and IPM-FG such that the total number of data points used during training equals that of the other methods in 10, 000 epochs. For all methods, we set the initial point (w0, v0, θ0) to (0, 1, 1). We set the other hyperparameters including step sizes, momentum parameters, and barrier parameters as diminishing sequences of the form {(k + 1) α}k 0, with the exponent α tuned via grid search to best suit each method in terms of computational performance.