reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Refining Adaptive Zeroth-Order Optimization at Ease

Authors: Yao Shu, Qixin Zhang, Kun He, Zhongxiang Dai

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our extensive experiments, including synthetic problems, black-box adversarial attack, and memory-efficient fine-tuning of large language models (LLMs), further verify the superior convergence of R-Ada ZO, indicating that R-Ada ZO offers an improved solution for realworld ZO optimization challenges. Through extensive experiments, including synthetic problems (Sec. 6.1), black-box adversarial attack (Sec. 6.2), and memory-efficient LLM fine-tuning (Sec. 6.3), we demonstrate that R-Ada ZO consistently outperforms existing methods in practice, exhibiting superior convergence.
Researcher Affiliation	Academia	1Hong Kong University of Science and Technology (Guangzhou) 2Nanyang Technological University 3Huazhong University of Science and Technology 4The Chinese University of Hong Kong, Shenzhen. Correspondence to: Zhongxiang Dai <EMAIL>.
Pseudocode	Yes	Algorithm 1 ZO-Ada MM Algorithm 2 R-Ada ZO
Open Source Code	No	The paper does not contain any explicit statement about releasing source code or a link to a code repository.
Open Datasets	Yes	black-box adversarial attack on an image from the MNIST dataset (Lecun et al., 1998) fine-tuning of large language models (Malladi et al., 2023; Zhang et al., 2024b) motivates our use of this setting to further demonstrate the superiority of R-Ada ZO over other adaptive ZO optimization algorithms (experimental setup in Appx. B.3). The results in Fig. 3(a-c) show that, for both OPT-1.3B and OPT-13B models (Zhang et al., 2022) and dataset SST-2 (Socher et al., 2013) and Copa (Roemmele et al., 2011)
Dataset Splits	No	The paper mentions using well-known public datasets like MNIST, SST-2, and Copa, and refers to fine-tuning LLMs with LoRA adapters, but does not explicitly provide details about training/test/validation splits for these datasets within the main text or appendices. For synthetic functions, dataset splits are not applicable.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., GPU models, CPU types, memory) used for running the experiments. It only mentions general concepts like 'memory-efficient fine-tuning'.
Software Dependencies	No	The paper does not provide specific software dependencies (e.g., programming language versions, library versions, specific frameworks) with version numbers. It mentions using 'LoRA adapters' but without versioning or specific software details.
Experiment Setup	Yes	For a fair comparison, we employ the same initialization and hyperparameters: β1 = 0.9, β2 = 0.99 and K = 10, η = 0.001, µ = 0.005, for all methods. (Synthetic Functions - Section B.1) For a fair comparison, we employ the same hyperparameters: β1 = 0.9, β2 = 0.99 and K = 2, η = 0.01, µ = 0.005, for all methods. (Black-Box Adversarial Attack - Section B.2) For a fair comparison, we employ the same hyperparameters: β1 = 0.9, β2 = 0.99 and K = 1, η = 0.00005, µ = 0.001, for all methods. (Memory-Efficient LLM Fine-Tuning - Section B.3)