A Classification View on Meta Learning Bandits
Authors: Mirco Mutti, Jeongyeol Kwon, Shie Mannor, Aviv Tamar
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Section 5 provides numerical experiments that showcase our algorithms against UCB/TS-like approaches for latent bandits (Hong et al., 2020a). |
| Researcher Affiliation | Collaboration | 1Technion Israel Institute of Technology 2University of Wisconsin-Madison 3NVIDIA Research. |
| Pseudocode | Yes | Algorithm 1 Explicit Classify then Exploit Algorithm 2 Update Remaining Hypotheses Algorithm 3 Meta Training Algorithm 4 Decision Tree Algorithm 5 Greedy Test Algorithm 6 Decision Tree Explicit Classify then Exploit |
| Open Source Code | Yes | The code to reproduce the experiments can be found at https://github.com/muttimirco/ece. |
| Open Datasets | No | To the purpose of the experiments, we consider a non-contextual stochastic MAB setting in which the collection of bandits is fully known, without covering class misspecifications. We design two family of collections, one inspired by the hard instance presented in Section 3.1, which we henceforth call hard, and one randomly generated collection, which we call rand. For the former, we consider two instances with size M = 5 and arms K = 10, with varying values of the separation parameters λ (0.4 and 0.04 respectively). For the latter, we consider a small instance M = 10, K = 20 and a large instance M = 40, K = 40. We use rejection sampling to control λ (set to 0.4) in the randomly generated collection. In all the considered instances, the reward distributions are Bernoulli. |
| Dataset Splits | No | The paper describes a 'meta training' and 'test' phase, which are conceptual stages for learning within the bandit framework. It does not provide specific data splits for a fixed dataset (e.g., percentages of train/test/validation data for a static dataset), but rather defines how bandit instances are generated and used in simulation. |
| Hardware Specification | No | The paper does not explicitly mention any specific hardware (e.g., GPU models, CPU types, or memory) used for running its experiments. |
| Software Dependencies | No | The paper mentions general algorithms like m UCB and m TS, and describes its own DT-ECE algorithm, but it does not specify any particular software libraries, frameworks, or their version numbers that would be required to reproduce the experiments. |
| Experiment Setup | Yes | To the purpose of the experiments, we consider a non-contextual stochastic MAB setting in which the collection of bandits is fully known, without covering class misspecifications. We design two family of collections, one inspired by the hard instance presented in Section 3.1, which we henceforth call hard, and one randomly generated collection, which we call rand. For the former, we consider two instances with size M = 5 and arms K = 10, with varying values of the separation parameters λ (0.4 and 0.04 respectively). For the latter, we consider a small instance M = 10, K = 20 and a large instance M = 40, K = 40. We use rejection sampling to control λ (set to 0.4) in the randomly generated collection. In all the considered instances, the reward distributions are Bernoulli. The curves average 20 independent runs, shaded regions are 95% c.i. |