Offline Learning for Combinatorial Multi-armed Bandits

Authors: Xutong Liu, Xiangxiang Dai, Jinhang Zuo, Siwei Wang, Carlee Joe-Wong, John C.S. Lui, Wei Chen

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical Validation: Finally, extensive experiments on both synthetic and real-world datasets for learning to rank and LLM caching validate the superior performance of CLCB compared to baseline algorithms.
Researcher Affiliation Collaboration 1ECE Department, Carnegie Mellon University, Pittsburgh PA, United States 2CSE Department, Chinese University of Hong Kong, Hong Kong SAR, China 3CS Department, City University of Hong Kong, Hong Kong SAR, China 4Microsoft Research, Beijing, China.
Pseudocode Yes Algorithm 1 CLCB: Combinatorial Lower Confidence Bound Algorithm for Off-CMAB
Open Source Code No The paper does not provide explicit links to source code for the methodology described, nor does it contain an unambiguous statement of code release.
Open Datasets Yes For real-world evaluation, we use the Yelp dataset3, where users rate businesses (Dai et al., 2024c). ... We use the Sci Q dataset (Welbl et al., 2017).
Dataset Splits No The paper mentions running experiments over a certain number of rounds (e.g., "n = 100 rounds") or with specific cache sizes, but it does not provide specific training/test/validation dataset splits for reproducibility of data partitioning.
Hardware Specification Yes All tests were performed on a mac OS system equipped with an Apple M3 Pro processor and 18 GB of RAM.
Software Dependencies No The paper mentions using GPT-4-o and GPT-4-turbo, along with Open AI’s tiktoken library and Open AI LLM API, but does not provide specific version numbers for these software components.
Experiment Setup Yes In the synthetic setup, we simulate 100 distinct queries with a cache size of 40, following a power-law frequency distribution (α = 0.9) as in (Zhu et al., 2023). ... For the evaluation, we work with 100 distinct prompts from the Sci Q dataset in an offline setting, performing a total of 10,000 queries with cache sizes of K = 10 and K = 20, respectively.