reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

A Classification View on Meta Learning Bandits

Authors: Mirco Mutti, Jeongyeol Kwon, Shie Mannor, Aviv Tamar

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Section 5 provides numerical experiments that showcase our algorithms against UCB/TS-like approaches for latent bandits (Hong et al., 2020a).
Researcher Affiliation	Collaboration	1Technion Israel Institute of Technology 2University of Wisconsin-Madison 3NVIDIA Research.
Pseudocode	Yes	Algorithm 1 Explicit Classify then Exploit Algorithm 2 Update Remaining Hypotheses Algorithm 3 Meta Training Algorithm 4 Decision Tree Algorithm 5 Greedy Test Algorithm 6 Decision Tree Explicit Classify then Exploit
Open Source Code	Yes	The code to reproduce the experiments can be found at https://github.com/muttimirco/ece.
Open Datasets	No	To the purpose of the experiments, we consider a non-contextual stochastic MAB setting in which the collection of bandits is fully known, without covering class misspecifications. We design two family of collections, one inspired by the hard instance presented in Section 3.1, which we henceforth call hard, and one randomly generated collection, which we call rand. For the former, we consider two instances with size M = 5 and arms K = 10, with varying values of the separation parameters λ (0.4 and 0.04 respectively). For the latter, we consider a small instance M = 10, K = 20 and a large instance M = 40, K = 40. We use rejection sampling to control λ (set to 0.4) in the randomly generated collection. In all the considered instances, the reward distributions are Bernoulli.
Dataset Splits	No	The paper describes a 'meta training' and 'test' phase, which are conceptual stages for learning within the bandit framework. It does not provide specific data splits for a fixed dataset (e.g., percentages of train/test/validation data for a static dataset), but rather defines how bandit instances are generated and used in simulation.
Hardware Specification	No	The paper does not explicitly mention any specific hardware (e.g., GPU models, CPU types, or memory) used for running its experiments.
Software Dependencies	No	The paper mentions general algorithms like m UCB and m TS, and describes its own DT-ECE algorithm, but it does not specify any particular software libraries, frameworks, or their version numbers that would be required to reproduce the experiments.
Experiment Setup	Yes	To the purpose of the experiments, we consider a non-contextual stochastic MAB setting in which the collection of bandits is fully known, without covering class misspecifications. We design two family of collections, one inspired by the hard instance presented in Section 3.1, which we henceforth call hard, and one randomly generated collection, which we call rand. For the former, we consider two instances with size M = 5 and arms K = 10, with varying values of the separation parameters λ (0.4 and 0.04 respectively). For the latter, we consider a small instance M = 10, K = 20 and a large instance M = 40, K = 40. We use rejection sampling to control λ (set to 0.4) in the randomly generated collection. In all the considered instances, the reward distributions are Bernoulli. The curves average 20 independent runs, shaded regions are 95% c.i.