reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Reinforcement learning with combinatorial actions for coupled restless bandits

Authors: Lily Xu, Bryan Wilder, Elias Khalil, Milind Tambe

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically validate SEQUOIA on four novel restless bandit problems with combinatorial constraints: multiple interventions, path constraints, bipartite matching, and capacity constraints. Our approach significantly outperforms existing methods which cannot address sequential planning and combinatorial selection simultaneously by an average of 24.8% on these difficult instances.
Researcher Affiliation	Academia	Lily Xu,1,2 Bryan Wilder,3 Elias B. Khalil,4 Milind Tambe1 1Harvard University, 2University of Oxford, 3Carnegie Mellon University, 4University of Toronto
Pseudocode	Yes	Algorithm 1 SEQUOIA for RL with combinatorial actions Input: MDP instance (S, A, P, R) and a set of constraints C A Parameter: Epsilon-greedy parameter ϵ, target update frequency F
Open Source Code	Yes	Contact: EMAIL. Code available at: https://github.com/lily-x/combinatorial-rmab
Open Datasets	No	For each problem instance, we randomly generate transition probabilities, constraints, and initial states. We ensure consistency by enforcing, across all algorithms, that transition dynamics and initial states across every problem instance (across every episode) are consistent across all methods. We evaluate the expected reward of each algorithm across 50 episodes of length H = 20 and average results over 30 random seeds. The paper describes a simulation setup where data is randomly generated for each problem instance rather than using a fixed publicly available dataset with specific access information. While map data for one problem is mentioned as available on a Github repo, it is not the primary experimental dataset.
Dataset Splits	No	For each problem instance, we randomly generate transition probabilities, constraints, and initial states. We ensure consistency by enforcing, across all algorithms, that transition dynamics and initial states across every problem instance (across every episode) are consistent across all methods. We evaluate the expected reward of each algorithm across 50 episodes of length H = 20 and average results over 30 random seeds. The paper describes a simulation-based experimental setup where data is randomly generated per instance and evaluated across episodes, but it does not specify fixed training, validation, or test splits for a static dataset.
Hardware Specification	Yes	Experiments are run on a cluster running Cent OS with Intel(R) Xeon(R) CPU E5-2683 v4 @ 2.1 GHz with 8GB of RAM using Python 3.9.12.
Software Dependencies	Yes	Our implementation uses Py Torch and Torch RL (Bou et al., 2024) libraries. We build off code from Dumouchelle et al. (2022) to embed a neural network with single-layer Re LU activations as a MILP (Fischetti & Jo, 2018). ... The MIP was solved using Gurobi optimizer 10.0.2. ... Python 3.9.12.
Experiment Setup	Yes	To train the Q-network, we use the hyperparameters specified in Table 2. Table 2: Hyperparameters used in implementation to train SEQUOIA Hyperparameter Setting Discount factor γ 0.99 Epsilon greedy ϵ 0.9 starting; final 0.05; 1000 rate of exponential decay Learning rate 2.5 10 4 Adam ϵ 1.5 10 4 Target update rate τ 1 10 6 Minibatch size 32 Replay memory size 10,000 Number of training episodes 100 Network architecture Two hidden layers of size 32 each