Exploiting Curvature in Online Convex Optimization with Delayed Feedback
Authors: Hao Qiu, Emmanuel Esposito, Mengxiao Zhang
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, in Section 6, we implement all our proposed algorithms and conduct experiments to validate our theoretical results across multiple delayed settings and loss functions with different curvature properties. We also compare our methods with existing approaches to demonstrate their effectiveness. . . . Experimental results. Figure 1 shows the mean cumulative regret and its standard deviation over 20 rounds for the instances with strong convexity, exp-concavity, and OLR under the two previously mentioned delay regimes. |
| Researcher Affiliation | Academia | 1Universit a degli Studi di Milano 2University of Iowa. Correspondence to: Hao Qiu <EMAIL>, Emmanuel Esposito <EMAIL>, Mengxiao Zhang <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Delayed FTRL for strongly convex functions Algorithm 2 Delayed ONS for exp-concave functions Algorithm 3 Delayed VAW forecaster with clipping |
| Open Source Code | Yes | The code for the experiments is available at https://github.com/haoqiu95/DOCO. |
| Open Datasets | Yes | In this section, we evaluate the performance of the proposed algorithms on three types of loss functions in the delayed OCO setting.5 All experiments are conducted over T = 10000 round and results are averaged over 20 independent trials. . . . We consider a real-world dataset mg scale from the LIBSVM repository (Chang & Lin, 2011). |
| Dataset Splits | No | The paper mentions using the 'mg scale' dataset and details the experimental setup, but it does not explicitly specify training/test/validation splits with percentages, sample counts, or refer to standard predefined splits for the dataset used. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments. It only mentions the duration of experiments ('T = 10000 round'). |
| Software Dependencies | No | The paper mentions using the 'LIBSVM repository (Chang & Lin, 2011)' as a source for a dataset, but it does not specify any software libraries or tools with their respective version numbers that were used for implementing or running the algorithms. |
| Experiment Setup | Yes | All experiments are conducted over T = 10000 round and results are averaged over 20 independent trials. To showcase the advantage of our algorithms, we consider two delay regimes. For the first case, each delay dt is independently and uniformly sampled from the set {0, 1, . . . , 5}, thus leading to E[ dtot] = Θ( T) and E[σmax] E[dmax] 5. In the second case, we define p = T 1/3 = 0.1. Then, for each t, dt is sampled from the same distribution with probability 1 p, and it is set to be dt = T t with probability p. . . . Strongly convex loss. We consider the following strongly convex losses ft(x) = 1 2( zt, x yt)2 + 1 2 x 2 2. The feasible set is the ball X = {x R5, x 2 2}. Each coordinate of the feature vector zt R5 at round t is uniformly chosen from [ 1, 1] while yt = zt, 1 + ϵt, where ϵt is an i.i.d. standard Gaussian noise. . . . The loss functions we consider for expconcave ones are ft(x) = 1 2 zt, x yt 2. The other configurations are the same as the experiments in the strongly convex case. . . . We still consider the loss function ft(x) = 1 2 zt, x yt 2 for all t [T], the same one as used in the exp-concave setting. The only difference is that the action space is now unconstrained (X = R5). . . . We also designed a non-stationary environment as follows. The generation processes for the feature vectors, as well as the definition of the loss function, remain the same as the environment in Section 6. However, we modified the generation of the label yt: yt = zt, θt + ϵt , (60) where the latent vector θt alternates every 30 rounds between the two vectors 1 and 0. This periodic change introduces non-stationarity, reflecting scenarios where the optimal action shifts over time. The delay dt is independently sampled from a distribution that alternates every 30 rounds between a geometric distribution with success probability T 1/3 and a uniform distribution over the set {0, 1, . . . , 5}. Additionally, we also modify the noise term ϵt inspired by Xu & Zeevi (2023). Specifically, we flatten an abstract art piece by Jackson Pollock and take consecutive grayscale values in [0, 1] as the noise ϵt. |