Meta-Black-Box-Optimization through Offline Q-function Learning
Authors: Zeyuan Ma, Zhiguang Cao, Zhou Jiang, Hongshu Guo, Yue-Jiao Gong
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through extensive benchmarking, we observe that Q-Mamba achieves competitive or even superior performance to prior online/offline baselines, while significantly improving the training efficiency of existing online baselines. We provide sourcecodes of Q-Mamba online. Experimental results show that our Q-Mamba effectively achieves competitive or even superior optimization performance to prior online/offline learning baselines, while consuming at most half training budget of the online baselines. |
| Researcher Affiliation | Academia | 1South China University of Technology, China 2Singapore Management University, Singapore. Correspondence to: Yue-Jiao Gong <EMAIL >. |
| Pseudocode | Yes | Algorithm 1 Pseudo code of Alg0 Algorithm 2 Pseudo code of Alg1 Algorithm 3 Pseudo code of Alg2 |
| Open Source Code | Yes | We provide sourcecodes of Q-Mamba online. |
| Open Datasets | Yes | A common choice of P in existing Meta BBO works is the Co Co BBOB Testsuites (Hansen et al., 2021), which contains 24 basic synthetic functions, each can be extended to numerous problem instances by randomly rotating and shifting the decision variables. |
| Dataset Splits | Yes | A common choice of P in existing Meta BBO works is the Co Co BBOB Testsuites (Hansen et al., 2021), which contains 24 basic synthetic functions... We divide it into 16 problem instances for training and 8 problem instances for testing. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. It only mentions general terms like 'training' and 'inferring time' without specifying the hardware used for these operations. |
| Software Dependencies | No | The paper mentions using 'AdamW' as an optimizer and 'Mamba-block' from a 'Mamba repo' (with a URL), but it does not specify version numbers for general software dependencies like Python, PyTorch, or CUDA, which are crucial for reproducibility. |
| Experiment Setup | Yes | We use Adam W with a learning rate 5e 3 to minimize the expectation training objective Eτ CJ(τ|θ). All baselines are trained for 300 epochs with batch size 64. In this paper, we set µ = 0.5 to strike a good balance. We additionally add a weight β (we set β = 10 in this paper) on the last action dimension... We set λ = 1 in this paper to strike a good balance. We represent the M = 16 action bins of each hyper-parameters Ai in A by 5-bit binary coding: 00000 01111. ...the total optimization steps for the low-level optimization is set as T = 500. |