Combinatorial Multi-armed Bandits: Arm Selection via Group Testing

Authors: Arpan Mukherjee, Shashanka Ubaru, Keerthiram Murugesan, Karthikeyan Shanmugam, Ali Tajer

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we provide empirical results to assess the performance of GT+QTS and compare it against the state-of-the-art CTS algorithm provided in (Wang & Chen, 2018) equipped with Oracle+ described in Section 4 as the exact oracle. We consider two reward functions: linear rewards and non-linear rewards modeled as the output of a 2-layer artificial neural network. We conduct experiments on both synthetic data and real-world datasets.
Researcher Affiliation Collaboration Arpan Mukherjee EMAIL Department of Electrical and Electronic Engineering Imperial College London Shashanka Ubaru EMAIL IBM Research Keerthiram Murugesan EMAIL IBM Research Karthikeyan Shanmugam EMAIL IBM Research Ali Tajer EMAIL Department of Electrical, Computer and Systems Engineering Rensselaer Polytechnic Institute
Pseudocode Yes Algorithm 1: GT+QTS Algorithm Algorithm 2: Oracle(θ)
Open Source Code No The paper does not contain any explicit statements about releasing source code or links to a code repository for the methodology described. It mentions Open Review for peer review, which is not a code repository.
Open Datasets Yes To capture the regret efficiency of the proposed GT-QTS algorithm, we also test its performance on the Movie Lens-100K dataset (Harper & Konstan, 2015), which consists of 100, 000 ratings from 943 users for 1682 movies.
Dataset Splits No The paper describes using the Movie Lens-100K dataset as a source for feedback in a bandit setting, stating "Each user is asked to annotate a minimum of 20 movies" and "we uniformly randomly select a user". However, it does not provide explicit training, test, or validation dataset splits for model evaluation in a traditional supervised learning sense. The experimental setup revolves around sequential feedback in a bandit problem, rather than static data partitions.
Hardware Specification Yes The experiment is averaged over 100 independent trials, and the simulation is performed using Python 3.9.21 on a Mac Book Pro with an M1 Pro processor and 16 GB of RAM.
Software Dependencies Yes The experiment is averaged over 100 independent trials, and the simulation is performed using Python 3.9.21 on a Mac Book Pro with an M1 Pro processor and 16 GB of RAM.
Experiment Setup Yes For GT-QTS, we have two hyperparameters: (1) the entries of the encoding and decoding matrix A, and (2) the number of tests ℓ. For (1), the entries Ai,j are drawn from a Bernoulli distribution with parameter p. In Lemma 1, we show that choosing p = 0.5 results in an O(log m) number of tests. ... In this experiment, for any set S I and any θ [0, 1]m, we define the reward function as r(S ; θ) = P i S θi. We set m = 5000 arms, and the mean vector µ is sampled uniformly randomly from [0, 1]5000. Furthermore, the cardinality constraint is set to K = 5 arms. ... Next, we evaluate the performance of GT+QTS on nonlinear mean reward functions. Specifically, we choose a 2-layer NN with 20 neurons and a sigmoid activation function. ... In the experiment, we uniformly randomly select a user and adopt the goal of recommending a set of 5 movies that match the user s preferences. Here, movies are designed as arms of a multi-armed bandit. In each round, the learner selects a super-arm of size K = 5 and receives feedback (rating) for each arm. ... For implementing the GT-QTS algorithm, we set = 0.2, which readily follows from the observation that movie ratings lie in the set {1, 2, 3, 4, 5}.