Refining Adaptive Zeroth-Order Optimization at Ease
Authors: Yao Shu, Qixin Zhang, Kun He, Zhongxiang Dai
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our extensive experiments, including synthetic problems, black-box adversarial attack, and memory-efficient fine-tuning of large language models (LLMs), further verify the superior convergence of R-Ada ZO, indicating that R-Ada ZO offers an improved solution for realworld ZO optimization challenges. Through extensive experiments, including synthetic problems (Sec. 6.1), black-box adversarial attack (Sec. 6.2), and memory-efficient LLM fine-tuning (Sec. 6.3), we demonstrate that R-Ada ZO consistently outperforms existing methods in practice, exhibiting superior convergence. |
| Researcher Affiliation | Academia | 1Hong Kong University of Science and Technology (Guangzhou) 2Nanyang Technological University 3Huazhong University of Science and Technology 4The Chinese University of Hong Kong, Shenzhen. Correspondence to: Zhongxiang Dai <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 ZO-Ada MM Algorithm 2 R-Ada ZO |
| Open Source Code | No | The paper does not contain any explicit statement about releasing source code or a link to a code repository. |
| Open Datasets | Yes | black-box adversarial attack on an image from the MNIST dataset (Lecun et al., 1998) fine-tuning of large language models (Malladi et al., 2023; Zhang et al., 2024b) motivates our use of this setting to further demonstrate the superiority of R-Ada ZO over other adaptive ZO optimization algorithms (experimental setup in Appx. B.3). The results in Fig. 3(a-c) show that, for both OPT-1.3B and OPT-13B models (Zhang et al., 2022) and dataset SST-2 (Socher et al., 2013) and Copa (Roemmele et al., 2011) |
| Dataset Splits | No | The paper mentions using well-known public datasets like MNIST, SST-2, and Copa, and refers to fine-tuning LLMs with LoRA adapters, but does not explicitly provide details about training/test/validation splits for these datasets within the main text or appendices. For synthetic functions, dataset splits are not applicable. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU models, CPU types, memory) used for running the experiments. It only mentions general concepts like 'memory-efficient fine-tuning'. |
| Software Dependencies | No | The paper does not provide specific software dependencies (e.g., programming language versions, library versions, specific frameworks) with version numbers. It mentions using 'LoRA adapters' but without versioning or specific software details. |
| Experiment Setup | Yes | For a fair comparison, we employ the same initialization and hyperparameters: β1 = 0.9, β2 = 0.99 and K = 10, η = 0.001, µ = 0.005, for all methods. (Synthetic Functions - Section B.1) For a fair comparison, we employ the same hyperparameters: β1 = 0.9, β2 = 0.99 and K = 2, η = 0.01, µ = 0.005, for all methods. (Black-Box Adversarial Attack - Section B.2) For a fair comparison, we employ the same hyperparameters: β1 = 0.9, β2 = 0.99 and K = 1, η = 0.00005, µ = 0.001, for all methods. (Memory-Efficient LLM Fine-Tuning - Section B.3) |