reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Meta-Black-Box-Optimization through Offline Q-function Learning

Authors: Zeyuan Ma, Zhiguang Cao, Zhou Jiang, Hongshu Guo, Yue-Jiao Gong

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through extensive benchmarking, we observe that Q-Mamba achieves competitive or even superior performance to prior online/offline baselines, while significantly improving the training efficiency of existing online baselines. We provide sourcecodes of Q-Mamba online. Experimental results show that our Q-Mamba effectively achieves competitive or even superior optimization performance to prior online/offline learning baselines, while consuming at most half training budget of the online baselines.
Researcher Affiliation	Academia	1South China University of Technology, China 2Singapore Management University, Singapore. Correspondence to: Yue-Jiao Gong <EMAIL >.
Pseudocode	Yes	Algorithm 1 Pseudo code of Alg0 Algorithm 2 Pseudo code of Alg1 Algorithm 3 Pseudo code of Alg2
Open Source Code	Yes	We provide sourcecodes of Q-Mamba online.
Open Datasets	Yes	A common choice of P in existing Meta BBO works is the Co Co BBOB Testsuites (Hansen et al., 2021), which contains 24 basic synthetic functions, each can be extended to numerous problem instances by randomly rotating and shifting the decision variables.
Dataset Splits	Yes	A common choice of P in existing Meta BBO works is the Co Co BBOB Testsuites (Hansen et al., 2021), which contains 24 basic synthetic functions... We divide it into 16 problem instances for training and 8 problem instances for testing.
Hardware Specification	No	The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. It only mentions general terms like 'training' and 'inferring time' without specifying the hardware used for these operations.
Software Dependencies	No	The paper mentions using 'AdamW' as an optimizer and 'Mamba-block' from a 'Mamba repo' (with a URL), but it does not specify version numbers for general software dependencies like Python, PyTorch, or CUDA, which are crucial for reproducibility.
Experiment Setup	Yes	We use Adam W with a learning rate 5e 3 to minimize the expectation training objective Eτ CJ(τ\|θ). All baselines are trained for 300 epochs with batch size 64. In this paper, we set µ = 0.5 to strike a good balance. We additionally add a weight β (we set β = 10 in this paper) on the last action dimension... We set λ = 1 in this paper to strike a good balance. We represent the M = 16 action bins of each hyper-parameters Ai in A by 5-bit binary coding: 00000 01111. ...the total optimization steps for the low-level optimization is set as T = 500.