reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

On Stochastic Contextual Bandits with Knapsacks in Small Budget Regime

Authors: Hengquan Guo, Xin Liu

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our theory is further supported by experiments conducted on a large-scale dataset. Experiments: We evaluate AUPD using a real-world learning-to-rank dataset in the small budget regime. Our experimental results demonstrate that AUPD outperforms baseline algorithms and achieves the best performance under various initial budgets. The large-scale learning-to-rank dataset MSLR-WEB30k Qin & Liu (2013) contains 31, 278 queries. Each query includes various document-query contexts, each with a dimensionality of 136, and there are 20 documents/arms. The reward function rpx, aq is defined as the relevance of each document-context pair as collected in the dataset. We set the time horizon T 5000 and vary the budgets B t100, 600, 1000u. Our experimental results are shown in Figure 1. These results are obtained by averaging over 50 trials and are reported with a 95% confidence interval.
Researcher Affiliation	Academia	Hengquan Guo and Xin Liu School of Information Science and Technology Shanghai Tech University EMAIL
Pseudocode	Yes	Algorithm 1 Adaptive and Universal Primal Dual Algorithm for CBw K
Open Source Code	No	The paper does not contain any explicit statements about releasing code, nor does it provide a link to a code repository.
Open Datasets	Yes	We use the MSLR-WEB30K dataset Qin & Liu (2013) which is available at https://www.microsoft.com/ en-us/research/project/mslr/. This dataset has 31,278 arrivals, and the contextual dimension is 136. We extract \|A\| 20 documents (arms) per query and the reward is defined as the relevance judgments in the dataset which take 5 values from 0 (irrelevant) to 4 (perfectly relevant).
Dataset Splits	No	The paper mentions that it "randomly draw an arrival from the 31,278 data points" but does not specify any training, test, or validation splits for the dataset.
Hardware Specification	No	The paper does not provide any specific details about the hardware used for running the experiments, such as CPU or GPU models.
Software Dependencies	No	The paper states that "All the algorithms utilize gradient-boosted tree regression as the learning oracle for the reward function and the empirical mean for the cost function." However, it does not specify version numbers for the gradient-boosted tree regression library or any other software dependencies.
Experiment Setup	Yes	We set the time horizon T 5000 and vary the budgets B t100, 600, 1000u to represent the budget regime tΘp ? Tq, Θp T 3 4 q, Θp Tqu, respectively. The interaction terminates once the budget is exhausted, incurring zero reward and cost for the remaining rounds. The further details on the experiments and hyperparameters can be found in Appendix D. Our experimental results are shown in Figure 1. These results are obtained by averaging over 50 trials and are reported with a 95% confidence interval. Table 2: Parameters in the experiment. AUPD Algorithm V 0.1b ? T PGD Adaptive Chzhen et al. (2024) MT 4 ? T 2?T log T, δb 1{ ? T Square CBw K Han et al. (2023) U ?T log T, γ 0.1 a