Cost-Efficient Online Decision Making: A Combinatorial Multi-Armed Bandit Approach

Authors: Arman Rahbar, Niklas Åkerblom, Morteza Haghir Chehreghani

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the effectiveness of our framework via several experimental studies performed on data sets from a variety of important domains. We find that Thompson Sampling yields the best performance in most of the experiments we perform. In this section, we empirically validate our cost-efficient online decision making framework via various experiments.
Researcher Affiliation Collaboration Arman Rahbar EMAIL Chalmers University of Technology and University of Gothenburg Niklas Åkerblom EMAIL Volvo Car Corporation Chalmers University of Technology and University of Gothenburg Morteza Haghir Chehreghani EMAIL Chalmers University of Technology and University of Gothenburg
Pseudocode Yes Algorithm 1 Online decision making protocol Algorithm 2 Cost-efficient online decision making Algorithm 3 Posterior update Algorithm 4 Cost-efficient approximate oracle
Open Source Code Yes 5The source code for our experiments can be accessed here on Git Hub.
Open Datasets Yes We apply our framework to the LED display domain data set from the UCI repository (Breiman et al., 1988). Additionally, we use the Pro Publica recidivism (Compas) data set (Larson et al., 2016) and the Fair Isaac (Fico) credit risk data set (FICO et al., 2018), which both are preprocessed in the same way as by Hu et al. (2019). ... Finally, to demonstrate the applicability of our methods to real-valued test outcomes, we investigate the Breast Cancer Wisconsin data set (Street et al., 1993) (Section 6.3).
Dataset Splits No The paper describes an online decision-making framework where problem instances are processed sequentially. It mentions scenarios and the generation of a synthetic dataset but does not provide specific training/validation/test splits, percentages, or explicit methodologies for partitioning data in a reproducible manner for traditional supervised learning evaluation.
Hardware Specification Yes We run each experiment with 5 different random seeds on a single-node CPU machine with 8 cores, 16GB memory and mac OS.
Software Dependencies No We implement the algorithms in Python mainly using NumPy (Harris et al., 2020), NetworkX (Hagberg et al., 2008) and SciPy (Virtanen et al., 2020). The paper mentions software libraries but does not provide specific version numbers for Python or any of the listed libraries.
Experiment Setup Yes Unless otherwise specified, we consider Beta(2,2) as the prior distribution of all θ ij s. ... We assign fixed costs µ(0) ij [0, 1] and µ(1) ij [0, 1] (randomly sampled from a uniform distribution) for each data set before performing our experiments. ... The (maximum) numbers of enumerated hypotheses per decision region are 2, 5, 70 and 70 for Navigation, LED, Fico and Compas datasets, respectively. ... We enumerate 15 hypotheses for each decision. ... We enumerate 70 hypotheses per decision region. ... We run each experiment with 5 different random seeds...