Reinforced In-Context Black-Box Optimization
Authors: Lei Song, Chen-Xiao Gao, Ke Xue, Chenyang Wu, Dong Li, Jianye Hao, Zongzhang Zhang, Chao Qian
IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We perform experiments on BBOB synthetic functions, hyper-parameter optimization and robot control problems by using some representatives of heuristic search, EA, and BO as behavior algorithms to generate the offline datasets. The results show that RIBBO can automatically generate sequences of query points related to the user-desired regret across diverse problems, and achieve good performance universally. |
| Researcher Affiliation | Collaboration | 1National Key Laboratory for Novel Software Technology, Nanjing University, China 2School of Artificial Intelligence, Nanjing University, China 3Huawei Noah s Ark Lab, China 4College of Intelligence and Computing, Tianjin University, China |
| Pseudocode | Yes | Algorithm 1 Model Inference with HRR |
| Open Source Code | Yes | Our code is available at https://github.com/lamda-bbo/RIBBO. |
| Open Datasets | Yes | We use BBO Benchmarks (BBOB) [Elhara et al., 2019], HPO-B [Arango et al., 2021], and rover trajectory planning task [Wang et al., 2018]. |
| Dataset Splits | Yes | For BBOB and rover problems, we sample a set of functions from the task distribution as training and test tasks, while for HPO-B, we use the meta-training/test task splits provided by the authors. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers like Python 3.8, CPLEX 12.4) needed to replicate the experiment. |
| Experiment Setup | No | The paper states, 'The model architecture and hyper-parameters are consistent across these problems, with the average performance and standard deviation being reported after execution using distinct random seeds. Details of the model are given in Appendix A.' However, the main text does not explicitly provide the specific hyperparameter values or training configurations. |