Robust Black-Box Optimization for Stochastic Search and Episodic Reinforcement Learning
Authors: Maximilian Hüttenrauch, Gerhard Neumann
JMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show that our algorithm performs comparable to state-of-the-art black-box optimizers on standard benchmark functions. Further, it clearly outperforms ranking-based methods and other policy-gradient based black-box algorithms as well as state of the art deep reinforcement learning algorithms when used for episodic reinforcement learning tasks. |
| Researcher Affiliation | Academia | Maximilian H uttenrauch EMAIL Gerhard Neumann EMAIL Department of Computer Science Karlsruhe Institute of Technology Karlsruhe |
| Pseudocode | Yes | We provide pseudo-code for the robust target normalization technique in Algorithm 1. Algorithm 1 Robust target normalization |
| Open Source Code | Yes | An implementation of the algorithm can be found under https://github.com/ALRhub/cas-more |
| Open Datasets | Yes | The COCO framework (Hansen et al., 2021) provides continuous benchmark functions to compare the performance of optimizers on problems with different problem dimensions. |
| Dataset Splits | No | The paper uses benchmark functions and defines a budget of function evaluations and target function values. It does not provide explicit training/validation/test dataset splits as typically understood in supervised machine learning for data partitioning. The experiments are either on continuous benchmark functions or in simulated reinforcement learning environments. |
| Hardware Specification | No | The authors acknowledge support by the state of Baden-W urttemberg through bw HPC. Research that lead to this work was funded by the Federal Ministry of Education and Research (BMBF) and the state of Hesse as part of the NHR Program. Further, part of this work was performed on the Hore Ka supercomputer funded by the Ministry of Science, Research and the Arts Baden-W urttemberg and by the Federal Ministry of Education and Research. These mentions refer to general computing clusters/supercomputers but do not provide specific hardware details (e.g., CPU/GPU models, memory sizes). |
| Software Dependencies | No | Table 3 lists optimizers like 'adam' and notes 'python package NLOpt (Johnson, 2014)', but no specific version numbers are provided for these or other key software components. |
| Experiment Setup | Yes | Table 2: Empirically found default hyper-parameters for CAS-MORE based on the problem dimensionality n. Parameter Default Value K: Population size 4 + 3 log(n) Qmax: Maximum queue size max{ 1.5(1 + n + n(n + 1)/2) , 8(n + 1)} ϵµ: Trust-region for the mean 0.5 ϵΣ: Trust-region for the covariance 1.5 10+n1.5 cσ: Smoothing factor of evolution path 1 2+n0.75 vclip: Clip value for robust normalization 3 Excess kurtosis threshold 0.55. Table 3: Hyper-parameters for the deep RL and BBRL experiments (e.g., samples per iteration, GAE λ, discount factor, epochs, learning rate, hidden layers). |