Multi-Objective Neural Bandits with Random Scalarization

Authors: Ji Cheng, Bo Xue, Chengyu Lu, Ziqiang Cui, Qingfu Zhang

IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct experiments on synthetic, multi-objective optimization, and real-world cases, where the evaluations demonstrate the superior performance of our methods.
Researcher Affiliation Academia 1Department of Computer Science, City University of Hong Kong, Hong Kong 2The City University of Hong Kong Shenzhen Research Institute, Shenzhen, China EMAIL, EMAIL
Pseudocode Yes Algorithm 1 Multi-objective Neural Upper Confidence Bound with Scalarization (MONeural-UCB) Algorithm 2 Multi-objective Neural Thompson Sampling with Scalarization (MONeural-TS)
Open Source Code Yes Code is available through https://github.com/jicheng9617/MONB.
Open Datasets Yes We further empirically evaluate our methods in two realworld public datasets in the multitask learning community, i.e. multi MNIST [Sabour et al., 2017] and multi Fashion MNIST [Lin et al., 2019]. We repeat the experiments using three classical multi-objective optimization (MOO) problems [Zhang et al., 2009; Lin et al., 2022].
Dataset Splits No The paper describes how synthetic data is generated and how contexts are sampled for MOO cases, but it does not specify explicit training/test/validation dataset splits for reproducibility. For multi MNIST and multi Fashion MNIST, it describes the process of pairing input features with output labels to form contextual feature vectors and selecting arms, rather than specifying data splits.
Hardware Specification Yes All implementations are performed on a dedicated system configured with an Intel Core i7-9700K CPU and an NVIDIA GeForce RTX 2080 Ti GPU.
Software Dependencies No The paper mentions using the Adam optimizer but does not provide specific version numbers for any software components or libraries.
Experiment Setup Yes Each objective was estimated by a neural network with one hidden layer containing 100 neurons, and furthermore, we trained the networks with Adam optimizer by the learning rate η = 0.005, and with the step J = 1 each round. For the exploration factor, γ in MONeural-UCB and ρ in MONeural-TS, we chose 0.1 in these experiments. Based on hyperparameter tuning, we train two-layer neural networks with hidden layers M = 200. For neural bandits, we choose γ = 0.01 and ρ = 0.05 for the UCB and TS methods, respectively.