Multi-Objective Neural Bandits with Random Scalarization
Authors: Ji Cheng, Bo Xue, Chengyu Lu, Ziqiang Cui, Qingfu Zhang
IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct experiments on synthetic, multi-objective optimization, and real-world cases, where the evaluations demonstrate the superior performance of our methods. |
| Researcher Affiliation | Academia | 1Department of Computer Science, City University of Hong Kong, Hong Kong 2The City University of Hong Kong Shenzhen Research Institute, Shenzhen, China EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1 Multi-objective Neural Upper Confidence Bound with Scalarization (MONeural-UCB) Algorithm 2 Multi-objective Neural Thompson Sampling with Scalarization (MONeural-TS) |
| Open Source Code | Yes | Code is available through https://github.com/jicheng9617/MONB. |
| Open Datasets | Yes | We further empirically evaluate our methods in two realworld public datasets in the multitask learning community, i.e. multi MNIST [Sabour et al., 2017] and multi Fashion MNIST [Lin et al., 2019]. We repeat the experiments using three classical multi-objective optimization (MOO) problems [Zhang et al., 2009; Lin et al., 2022]. |
| Dataset Splits | No | The paper describes how synthetic data is generated and how contexts are sampled for MOO cases, but it does not specify explicit training/test/validation dataset splits for reproducibility. For multi MNIST and multi Fashion MNIST, it describes the process of pairing input features with output labels to form contextual feature vectors and selecting arms, rather than specifying data splits. |
| Hardware Specification | Yes | All implementations are performed on a dedicated system configured with an Intel Core i7-9700K CPU and an NVIDIA GeForce RTX 2080 Ti GPU. |
| Software Dependencies | No | The paper mentions using the Adam optimizer but does not provide specific version numbers for any software components or libraries. |
| Experiment Setup | Yes | Each objective was estimated by a neural network with one hidden layer containing 100 neurons, and furthermore, we trained the networks with Adam optimizer by the learning rate η = 0.005, and with the step J = 1 each round. For the exploration factor, γ in MONeural-UCB and ρ in MONeural-TS, we chose 0.1 in these experiments. Based on hyperparameter tuning, we train two-layer neural networks with hidden layers M = 200. For neural bandits, we choose γ = 0.01 and ρ = 0.05 for the UCB and TS methods, respectively. |