reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Reinforced In-Context Black-Box Optimization

Authors: Lei Song, Chen-Xiao Gao, Ke Xue, Chenyang Wu, Dong Li, Jianye Hao, Zongzhang Zhang, Chao Qian

IJCAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We perform experiments on BBOB synthetic functions, hyper-parameter optimization and robot control problems by using some representatives of heuristic search, EA, and BO as behavior algorithms to generate the offline datasets. The results show that RIBBO can automatically generate sequences of query points related to the user-desired regret across diverse problems, and achieve good performance universally.
Researcher Affiliation	Collaboration	1National Key Laboratory for Novel Software Technology, Nanjing University, China 2School of Artificial Intelligence, Nanjing University, China 3Huawei Noah s Ark Lab, China 4College of Intelligence and Computing, Tianjin University, China
Pseudocode	Yes	Algorithm 1 Model Inference with HRR
Open Source Code	Yes	Our code is available at https://github.com/lamda-bbo/RIBBO.
Open Datasets	Yes	We use BBO Benchmarks (BBOB) [Elhara et al., 2019], HPO-B [Arango et al., 2021], and rover trajectory planning task [Wang et al., 2018].
Dataset Splits	Yes	For BBOB and rover problems, we sample a set of functions from the task distribution as training and test tasks, while for HPO-B, we use the meta-training/test task splits provided by the authors.
Hardware Specification	No	The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers like Python 3.8, CPLEX 12.4) needed to replicate the experiment.
Experiment Setup	No	The paper states, 'The model architecture and hyper-parameters are consistent across these problems, with the average performance and standard deviation being reported after execution using distinct random seeds. Details of the model are given in Appendix A.' However, the main text does not explicitly provide the specific hyperparameter values or training configurations.