Reinforced In-Context Black-Box Optimization

Authors: Lei Song, Chen-Xiao Gao, Ke Xue, Chenyang Wu, Dong Li, Jianye Hao, Zongzhang Zhang, Chao Qian

IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We perform experiments on BBOB synthetic functions, hyper-parameter optimization and robot control problems by using some representatives of heuristic search, EA, and BO as behavior algorithms to generate the offline datasets. The results show that RIBBO can automatically generate sequences of query points related to the user-desired regret across diverse problems, and achieve good performance universally.
Researcher Affiliation Collaboration 1National Key Laboratory for Novel Software Technology, Nanjing University, China 2School of Artificial Intelligence, Nanjing University, China 3Huawei Noah s Ark Lab, China 4College of Intelligence and Computing, Tianjin University, China
Pseudocode Yes Algorithm 1 Model Inference with HRR
Open Source Code Yes Our code is available at https://github.com/lamda-bbo/RIBBO.
Open Datasets Yes We use BBO Benchmarks (BBOB) [Elhara et al., 2019], HPO-B [Arango et al., 2021], and rover trajectory planning task [Wang et al., 2018].
Dataset Splits Yes For BBOB and rover problems, we sample a set of functions from the task distribution as training and test tasks, while for HPO-B, we use the meta-training/test task splits provided by the authors.
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers like Python 3.8, CPLEX 12.4) needed to replicate the experiment.
Experiment Setup No The paper states, 'The model architecture and hyper-parameters are consistent across these problems, with the average performance and standard deviation being reported after execution using distinct random seeds. Details of the model are given in Appendix A.' However, the main text does not explicitly provide the specific hyperparameter values or training configurations.