Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]
Complexity of Highly Parallel Non-Smooth Convex Optimization
Authors: Sebastien Bubeck, Qijia Jiang, Yin-Tat Lee, Yuanzhi Li, Aaron Sidford
NeurIPS 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | A landmark result of non-smooth convex optimization is that gradient descent is an optimal algorithm whenever the number of computed gradients is smaller than the dimension d. In this paper we study the extension of this result to the parallel optimization setting. Namely we consider optimization algorithms interacting with a highly parallel gradient oracle, that is one that can answer poly(d) gradient queries in parallel. We show that in this case gradient descent is optimal only up to e O(d) rounds of interactions with the oracle. The lower bound improves upon a decades old construction by Nemirovski which proves optimality only up to d1/3 rounds (as recently observed by Balkanski and Singer), and the suboptimality of gradient descent after d rounds was already observed by Duchi, Bartlett and Wainwright. In the latter regime we propose a new method with improved complexity, which we conjecture to be optimal. The analysis of this new method is based upon a generalized version of the recent results on optimal acceleration for highly smooth convex optimization. |
| Researcher Affiliation | Collaboration | Sébastien Bubeck Microsoft Research EMAIL Qijia Jiang Stanford University EMAIL Yin Tat Lee University of Washington & Microsoft Research EMAIL Yuanzhi Li Stanford University EMAIL Aaron Sidford Stanford University EMAIL |
| Pseudocode | Yes | Algorithm 1: Compute vector field approximating rg Algorithm 2: Approximate minimization of g(y) + Φ(ky ck) |
| Open Source Code | No | The paper is theoretical and does not contain any explicit statements or links indicating that open-source code for the described methodology is provided or made available. |
| Open Datasets | No | The paper describes theoretical results and algorithms. It refers to abstract functions like a 'random function f' or 'convolved function' for theoretical analysis and proofs, but does not mention the use of any specific real-world or publicly available datasets with access information (link, DOI, or formal citation). |
| Dataset Splits | No | The paper is theoretical and does not conduct empirical experiments on datasets, thus it does not provide any training, validation, or test dataset splits. |
| Hardware Specification | No | The paper is theoretical and focuses on algorithm design and complexity. It does not provide any specific details about the hardware used, such as GPU models, CPU types, or memory specifications. |
| Software Dependencies | No | The paper is theoretical and does not describe any specific software dependencies with version numbers that would be needed to replicate experiments. |
| Experiment Setup | No | The paper is theoretical and does not involve empirical experiments. Therefore, it does not provide details on experimental setup, such as hyperparameters, training configurations, or system-level settings. |