reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Robust Black-Box Optimization for Stochastic Search and Episodic Reinforcement Learning

Authors: Maximilian Hüttenrauch, Gerhard Neumann

JMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show that our algorithm performs comparable to state-of-the-art black-box optimizers on standard benchmark functions. Further, it clearly outperforms ranking-based methods and other policy-gradient based black-box algorithms as well as state of the art deep reinforcement learning algorithms when used for episodic reinforcement learning tasks.
Researcher Affiliation	Academia	Maximilian H uttenrauch EMAIL Gerhard Neumann EMAIL Department of Computer Science Karlsruhe Institute of Technology Karlsruhe
Pseudocode	Yes	We provide pseudo-code for the robust target normalization technique in Algorithm 1. Algorithm 1 Robust target normalization
Open Source Code	Yes	An implementation of the algorithm can be found under https://github.com/ALRhub/cas-more
Open Datasets	Yes	The COCO framework (Hansen et al., 2021) provides continuous benchmark functions to compare the performance of optimizers on problems with diﬀerent problem dimensions.
Dataset Splits	No	The paper uses benchmark functions and defines a budget of function evaluations and target function values. It does not provide explicit training/validation/test dataset splits as typically understood in supervised machine learning for data partitioning. The experiments are either on continuous benchmark functions or in simulated reinforcement learning environments.
Hardware Specification	No	The authors acknowledge support by the state of Baden-W urttemberg through bw HPC. Research that lead to this work was funded by the Federal Ministry of Education and Research (BMBF) and the state of Hesse as part of the NHR Program. Further, part of this work was performed on the Hore Ka supercomputer funded by the Ministry of Science, Research and the Arts Baden-W urttemberg and by the Federal Ministry of Education and Research. These mentions refer to general computing clusters/supercomputers but do not provide specific hardware details (e.g., CPU/GPU models, memory sizes).
Software Dependencies	No	Table 3 lists optimizers like 'adam' and notes 'python package NLOpt (Johnson, 2014)', but no specific version numbers are provided for these or other key software components.
Experiment Setup	Yes	Table 2: Empirically found default hyper-parameters for CAS-MORE based on the problem dimensionality n. Parameter Default Value K: Population size 4 + 3 log(n) Qmax: Maximum queue size max{ 1.5(1 + n + n(n + 1)/2) , 8(n + 1)} ϵµ: Trust-region for the mean 0.5 ϵΣ: Trust-region for the covariance 1.5 10+n1.5 cσ: Smoothing factor of evolution path 1 2+n0.75 vclip: Clip value for robust normalization 3 Excess kurtosis threshold 0.55. Table 3: Hyper-parameters for the deep RL and BBRL experiments (e.g., samples per iteration, GAE λ, discount factor, epochs, learning rate, hidden layers).