Pairwise Elimination with Instance-Dependent Guarantees for Bandits with Cost Subsidy
Authors: Ishank Juneja, Carlee Joe-Wong, Osman Yagan
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, experiments are conducted using the Movie Lens 25M and Goodreads datasets for both PE and PE-CS revealing the effectiveness of PE and the superior balance between performance and reliability offered by PE-CS compared to baselines from the literature. |
| Researcher Affiliation | Academia | Ishank Juneja, Carlee Joe-Wong & Osman Yagan Department of Electrical and Computer Engineering Carnegie Mellon University Pittsburgh, PA 15213, USA EMAIL |
| Pseudocode | Yes | Algorithm 1: Pairwise Elimination (PE) for a known reference arm ℓ Algorithm 2: Pairwise Elimination for Cost Subsidy Problem (PE-CS). Algorithm 3: Cost-Subsidized Explore-Then-Commit (ETC-CS) Algorithm 4: Cost-Subsidized Thompson Sampling with Beta Priors (TS-CS) Algorithm 5: Cost-Subsidized UCB (UCB-CS) Algorithm 6: FIXED THRESHOLD UCB (FT-UCB) |
| Open Source Code | Yes | Code available at https://github.com/ishank-juneja/bandits-with-costs |
| Open Datasets | Yes | Finally, experiments are conducted using the Movie Lens 25M and Goodreads datasets for both PE and PE-CS revealing the effectiveness of PE and the superior balance between performance and reliability offered by PE-CS compared to baselines from the literature. F. Maxwell Harper and Joseph A. Konstan. The movielens datasets: History and context. ACM Trans. Interact. Intell. Syst., 5(4), dec 2015. ISSN 2160-6455. doi: 10.1145/2827872. URL https://doi.org/10.1145/2827872. Mengting Wan and Julian Mc Auley. Item recommendation on monotonic behavior chains. In Proceedings of the 12th ACM conference on recommender systems, pp. 86 94, 2018. |
| Dataset Splits | No | The paper describes how bandit instances are constructed from the Movie Lens and Goodreads datasets (e.g., 20 arms for Movie Lens, 8 for Goodreads) and how rewards and costs are assigned. However, it does not provide specific training/test/validation splits for these datasets in the traditional sense of partitioning the data for distinct training, validation, and testing phases. The experiments involve sequential sampling over a time horizon T (e.g., T=5M). |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. |
| Software Dependencies | No | The paper does not mention any specific software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions) that would be needed to replicate the experiments. |
| Experiment Setup | Yes | Data points represent terminal regret at T =5M and each data point represents the outcome from an experiment. There are 25 such independent runs for each algorithm. For every genre we first obtain the mean 5-point scale rating of all books or movies tagged with that genre and then divide this rating by 5 so that it lies between 0 and 1. We then treat this fractional rating as the expected reward return from that genre. |