reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

EARL-BO: Reinforcement Learning for Multi-Step Lookahead, High-Dimensional Bayesian Optimization

Authors: Mujin Cheon, Jay H Lee, Dong-Yeun Koh, Calvin Tsay

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate the proposed method, EARL-BO (Encoder Augmented RL for BO), on synthetic benchmark functions and hyperparameter tuning problems, finding significantly improved performance compared to existing multi-step lookahead and high-dimensional BO methods. Comprehensive evaluations of EARL-BO across both synthetic benchmark functions and real-world hyperparameter tuning, against other multi-step lookahead and high-dimensional optimization methods.
Researcher Affiliation	Academia	1Department of Computing, Imperial College London, UK 2Department of Chemical and Biomolecular Engineering, Korea Advanced Institute of Science & Technology (KAIST), South Korea 3Mork Family Department of Chemical Engineering and Materials Science, University of Southern California, USA. Correspondence to: Calvin Tsay <EMAIL>.
Pseudocode	Yes	EARL-BO Summary (Algorithm 1). The EARL-BO algorithm implements a hybrid model-based and model-free RL approach, loosely following the Dyna framework (Silver et al., 2008; Wu et al., 2023). ... Algorithm 1 EARL-BO Input: data Dk, action bounds [lb, ub] Parameters: lookahead horizon, max episodes, update episodes, off policy episodes Output: next query point xt+1 Initialize RL agent (PPO agent), encoder network, and memory buffer Fit GP to Dk for k = 1 to max episodes do Reset environment state s with Dk for step = 1 to lookahead horizon do Encode state s using encoder network if k off policy episodes then Select action a using Tu RBO acquisition else Select action a using RL agent end if Sample yk+1 N µk(x; Dk), Kk(x, x; Dk) Compute reward r = R(Dk, xk+1, Dk+1) using yk+1 Update environment state s = Dk+1 Store transition (s, a, s , r) to memory buffer s s end for if kmod update episodes = 0 then if k off policy episodes then Update RL agent with initial policy else Calculate PPO loss using memory buffer Train actor, critic, and encoder networks end if end if Clear memory buffer end for Encode state using final encoder network Return xk+1 output from final actor network
Open Source Code	No	The paper only mentions the use of third-party implementations for baselines (e.g., 'For EI, Random, and Rollout VR, we use implementations from (Lee et al., 2020) found at: https://github.com/erichanslee/lookahead_release. For Tu RBO, we use the current implementation from the Uber research group: https://github.com/uber-research/Tu RBO.') but does not provide any link or explicit statement about releasing the source code for the proposed EARL-BO method.
Open Datasets	Yes	We next evaluate EARL-BO in real-world scenarios using the Hyperparameter Optimization Benchmarks (HPO-B) dataset (Arango et al., 2021). The real-world Hyperparameter Optimization dataset is sourced from the HPO-B dataset (Arango et al., 2021), a collection of HPO datasets grouped by search space and tasks.
Dataset Splits	Yes	We initialize the BO algorithms using 30 random points within the search space and evaluate performance using simple regret (yopt y k). Each method is tested for ten replications by resampling the initial dataset. We initialize the BO algorithms using five random points for 6- and 8-D and 50 for the 19-D problem.
Hardware Specification	Yes	We conducted our experiments on a computing server with AMD EPYC 7742 processors equipped. The specific allocation for each job was as follows: 16 CPUs and max memory of 100 GB.
Software Dependencies	No	The paper mentions software components and algorithms such as PPO and Adam, but does not provide specific version numbers for any programming languages, libraries, or frameworks used (e.g., Python, PyTorch, TensorFlow).
Experiment Setup	Yes	Table 1 displays a comprehensive list of hyperparameters for EARL-BO. We would like to underline that none of these presented values for hyperparameters were tuned across problems. In other words, across various dimensions and function forms, we have kept the same hyperparameters with most basic PPO and Encoder values. ... Table 1. EARL-BO hyperparameter values. Learning rate 0.001 # epochs 100 Epsilon clip ϵ 0.2 β values for Adam (0.9, 0.999) Discount factor γ 0.95 Value function coefficient 0.5 Entropy coefficient 0.1 # layers frozen 2 Max episodes 4000 Update frequency 50 # off-policy episodes 400 No-improvement threshold 15 Horizon 5 Hidden dimension 64 Output dimension 16 Learning rate 0.01 Kernel RBF + White Kernel RBF length-scale bounds (1e-2, 1e2) Noise bounds (1e-10, 1e1)