Meta-Black-Box-Optimization through Offline Q-function Learning

Authors: Zeyuan Ma, Zhiguang Cao, Zhou Jiang, Hongshu Guo, Yue-Jiao Gong

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through extensive benchmarking, we observe that Q-Mamba achieves competitive or even superior performance to prior online/offline baselines, while significantly improving the training efficiency of existing online baselines. We provide sourcecodes of Q-Mamba online. Experimental results show that our Q-Mamba effectively achieves competitive or even superior optimization performance to prior online/offline learning baselines, while consuming at most half training budget of the online baselines.
Researcher Affiliation Academia 1South China University of Technology, China 2Singapore Management University, Singapore. Correspondence to: Yue-Jiao Gong <EMAIL >.
Pseudocode Yes Algorithm 1 Pseudo code of Alg0 Algorithm 2 Pseudo code of Alg1 Algorithm 3 Pseudo code of Alg2
Open Source Code Yes We provide sourcecodes of Q-Mamba online.
Open Datasets Yes A common choice of P in existing Meta BBO works is the Co Co BBOB Testsuites (Hansen et al., 2021), which contains 24 basic synthetic functions, each can be extended to numerous problem instances by randomly rotating and shifting the decision variables.
Dataset Splits Yes A common choice of P in existing Meta BBO works is the Co Co BBOB Testsuites (Hansen et al., 2021), which contains 24 basic synthetic functions... We divide it into 16 problem instances for training and 8 problem instances for testing.
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. It only mentions general terms like 'training' and 'inferring time' without specifying the hardware used for these operations.
Software Dependencies No The paper mentions using 'AdamW' as an optimizer and 'Mamba-block' from a 'Mamba repo' (with a URL), but it does not specify version numbers for general software dependencies like Python, PyTorch, or CUDA, which are crucial for reproducibility.
Experiment Setup Yes We use Adam W with a learning rate 5e 3 to minimize the expectation training objective Eτ CJ(τ|θ). All baselines are trained for 300 epochs with batch size 64. In this paper, we set µ = 0.5 to strike a good balance. We additionally add a weight β (we set β = 10 in this paper) on the last action dimension... We set λ = 1 in this paper to strike a good balance. We represent the M = 16 action bins of each hyper-parameters Ai in A by 5-bit binary coding: 00000 01111. ...the total optimization steps for the low-level optimization is set as T = 500.