Robust Black-Box Optimization for Stochastic Search and Episodic Reinforcement Learning

Authors: Maximilian Hüttenrauch, Gerhard Neumann

JMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show that our algorithm performs comparable to state-of-the-art black-box optimizers on standard benchmark functions. Further, it clearly outperforms ranking-based methods and other policy-gradient based black-box algorithms as well as state of the art deep reinforcement learning algorithms when used for episodic reinforcement learning tasks.
Researcher Affiliation Academia Maximilian H uttenrauch EMAIL Gerhard Neumann EMAIL Department of Computer Science Karlsruhe Institute of Technology Karlsruhe
Pseudocode Yes We provide pseudo-code for the robust target normalization technique in Algorithm 1. Algorithm 1 Robust target normalization
Open Source Code Yes An implementation of the algorithm can be found under https://github.com/ALRhub/cas-more
Open Datasets Yes The COCO framework (Hansen et al., 2021) provides continuous benchmark functions to compare the performance of optimizers on problems with different problem dimensions.
Dataset Splits No The paper uses benchmark functions and defines a budget of function evaluations and target function values. It does not provide explicit training/validation/test dataset splits as typically understood in supervised machine learning for data partitioning. The experiments are either on continuous benchmark functions or in simulated reinforcement learning environments.
Hardware Specification No The authors acknowledge support by the state of Baden-W urttemberg through bw HPC. Research that lead to this work was funded by the Federal Ministry of Education and Research (BMBF) and the state of Hesse as part of the NHR Program. Further, part of this work was performed on the Hore Ka supercomputer funded by the Ministry of Science, Research and the Arts Baden-W urttemberg and by the Federal Ministry of Education and Research. These mentions refer to general computing clusters/supercomputers but do not provide specific hardware details (e.g., CPU/GPU models, memory sizes).
Software Dependencies No Table 3 lists optimizers like 'adam' and notes 'python package NLOpt (Johnson, 2014)', but no specific version numbers are provided for these or other key software components.
Experiment Setup Yes Table 2: Empirically found default hyper-parameters for CAS-MORE based on the problem dimensionality n. Parameter Default Value K: Population size 4 + 3 log(n) Qmax: Maximum queue size max{ 1.5(1 + n + n(n + 1)/2) , 8(n + 1)} ϵµ: Trust-region for the mean 0.5 ϵΣ: Trust-region for the covariance 1.5 10+n1.5 cσ: Smoothing factor of evolution path 1 2+n0.75 vclip: Clip value for robust normalization 3 Excess kurtosis threshold 0.55. Table 3: Hyper-parameters for the deep RL and BBRL experiments (e.g., samples per iteration, GAE λ, discount factor, epochs, learning rate, hidden layers).