reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

BOFormer: Learning to Solve Multi-Objective Bayesian Optimization via Non-Markovian RL

Authors: Yu Heng Hung, Kai-Jie Lin, Yu-Heng Lin, Chien-Yi Wang, Cheng Sun, Ping-Chun Hsieh

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through extensive evaluation, we demonstrate that BOFormer constantly outperforms the benchmark rule-based and learning-based algorithms in various synthetic MOBO and realworld multi-objective hyperparameter optimization problems. We evaluate the proposed BOFormer on a variety of black-box functions, including both synthetic optimization functions and real-world hyperparameter optimization on HPO-3DGS. Unless stated otherwise, we report the average attained hypervolume at the final step over 100 evaluation episodes in the main text. Table 1 and Figure 6 shows the averaged hypervolume on synthetic problems. We also conducted an ablation study on evaluating how the sequence length (denoted by w) would affect the hypervolume performance of BOFormer. Figure 4 shows that the hypervolume of non-Markovian BOFormer (w > 1) is superior to that of Markovian BOFormer (w = 1).
Researcher Affiliation	Collaboration	1National Yang Ming Chiao Tung University, Hsinchu, Taiwan 2NVIDIA Research EMAIL
Pseudocode	Yes	The pseudo code is in Algorithm 1 in Appendix C. ... C.1 Pseudo Code of BOFormer The detailed pseudo code of the training processes for BOFormer under off-policy learning and on-policy learning setting are provided in Algorithms 1 and 2, respectively.
Open Source Code	Yes	We have made the source code publicly available to encourage further research in this direction.* *https://hungyuheng.github.io/BOFormer/
Open Datasets	Yes	We use 4 different objects from (Mildenhall et al., 2021) and 64 different chairs from (Yu et al., 2023) to compare the performance of different hyperparameter tuning methods.
Dataset Splits	No	The paper does not explicitly provide training/test/validation dataset splits with percentages, sample counts, or specific split files. It mentions training on synthetic GP functions and evaluating on synthetic functions and HPO-3DGS, but not how a fixed dataset is split for these purposes. For HPO-3DGS, it states 'While testing on chairs, each episode is conducted with an individual scene,' which describes an evaluation method rather than a dataset split for reproduction.
Hardware Specification	Yes	Notably, the above training was already on a high-end GPU server with NVIDIA RTX 6000 Ada Generation GPUs and Intel Xeon Gold 5515+ CPU.
Software Dependencies	No	The paper mentions several software components like 'Adam (Kingma & Ba, 2015)', 'GPT-2-based Transformer architecture', 'Bo Torch (Balandat et al., 2020)', and 'deep learning frameworks (e.g., Py Torch)', but it does not specify concrete version numbers for any of these libraries or frameworks used in their implementation.
Experiment Setup	Yes	A.2 HYPERPARAMETERS OF LEARNING-BASED APPROACHES BOFormer: hidden size: 128 for all linear layers used to embed positional encodings, stateaction pairs, rewards, and Q-values, learning rate: 10-5, weight decay: 10-5, rdemo: 0.01, batch size: 8, number of attention layer: 8, number of head of attention layer: 4, window size w = 31, dropout: 0.1, buffer size: 64, training episode: 3000. FSAF: alpha: 0.8, hidden size: 100, learning rate: 0.01, batch size: 128, few shot step: 5, number of particles: 5, total task: 3, size of meta data: 100, use demo: True, early terminate: False, select type: average, training episode: 300 for K = 2 and 500 for K = 3.