Scalable Acceleration for Classification-Based Derivative-Free Optimization
Authors: Tianyi Han, Jingya Li, Zhipeng Guo, Yuan Jin
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on the synthetic functions as well as black-box tuning for language-model-as-a-service demonstrate empirically the efficiency of RACE-CARS. An ablation experiment on the introduced hyper-parameters is also conducted, revealing the mechanism of RACE-CARS and putting forward an empirical hyper-parameter tuning guidance. |
| Researcher Affiliation | Industry | Tianyi Han*, Jingya Li, Zhipeng Guo, Yuan Jin* Beijing Supreium Technology, Haidian District, Beijing, China EMAIL |
| Pseudocode | Yes | Algorithm 1: Batch-Mode Classification-Based Optimization Algorithm; Algorithm 2: RACOS; Algorithm 3: Sequential-Mode Classification-Based Optimization Algorithm; Algorithm 4: Accelerated Sequential-Mode Classification Based Optimization Algorithm |
| Open Source Code | No | The paper provides a link for a third-party tool used for comparison: "1Code can be found in https://github.com/txsun1997/Black Box-Tuning". However, there is no explicit statement or link indicating the release of the authors' own implementation code for RACE-CARS. |
| Open Datasets | Yes | We evaluate performance on datasets SST-2 (Socher et al. 2013), Yelp polarity and AG s News (Zhang, Zhao, and Le Cun 2015), and RTE (Wang et al. 2018a). |
| Dataset Splits | Yes | In this part we follow the experiments designed by (Sun et al. 2022) 1, where language understanding task is formulated as a classification task predicting for a batch of PTM-modified input texts X the labels Y in the PTM vocabulary, namely we need to tune the prompt such that the black-box PTM inference API f takes a continuous prompt p satisfying Y = f(p; X). ... We assess the algorithms based on the mean and deviation of training loss, training accuracy, development loss and development accuracy. The SST-2 dataset results are highlighted in Figure 3, with additional findings for Yelp Polarity, AG s News, and RTE detailed in the appendix. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models or processor types used for running the experiments. It mentions using 'RoBERTa' as a backbone model, which implies computational resources, but no specifications are given. |
| Software Dependencies | No | The paper mentions "Ro BERTa (Liu et al. 2019a) serving as the backbone model" but does not specify any software libraries, frameworks, or their version numbers required to replicate the experiments. |
| Experiment Setup | Yes | Region shrinking rate is configured to be γ = 0.9 and 0.95, with shrinking frequency of ρ = 0.01 and 0.001 for n = 50, 500, respectively. ... For our tests, the shrinking rate is γ = 0.7, with shrinking frequency of ρ = 0.002. Each algorithm is repeated 5 times independently with unique seeds. ... In our experimental setup, we configure the search space dimension to d = 500 and the prompt length to 50, with Ro BERTa (Liu et al. 2019a) serving as the backbone model. We evaluate performance on datasets SST-2 (Socher et al. 2013), Yelp polarity and AG s News (Zhang, Zhao, and Le Cun 2015), and RTE (Wang et al. 2018a). With a fixed API call budget of T = 8000 |