reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Improved Online Confidence Bounds for Multinomial Logistic Bandits

Authors: Joongkyu Lee, Min-Hwan Oh

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically evaluate the performance of our algorithms, OFU-MNL++ and OFU-M2NL, by measuring cumulative regret over T 3000 rounds. The algorithms are tested on 20 independent instances, and we report the average performance along with a shaded area representing two standard deviations. In each instance, the true underlying parameter w is uniformly sampled from the d-dimensional ball Bdp Bq of radius B, and the context features xti are drawn from a Bdp1q. The rewards are sampled from a uniform distribution in each round, i.e., rti Unifp0, 1q. The baselines are the practical and state-of-the-art algorithms: the UCB-based algorithm, UCB-MNL (Oh & Iyengar, 2019), the Thompson Sampling-based algorithm, TS-MNL (Oh & Iyengar, 2019), and the constant-time algorithm, OFU-MNL+ (Lee & Oh, 2024). Figure 1 shows that both of our algorithms significantly outperform the baseline algorithms.
Researcher Affiliation	Academia	1Seoul National University, Seoul, Korea.
Pseudocode	Yes	Algorithm 1 OFU-MNL++ Algorithm 2 RS-OMD, Restricted Space OMD
Open Source Code	No	The paper does not contain any explicit statement about open-sourcing the code for the methodology described, nor does it provide any links to a code repository.
Open Datasets	No	In each instance, the true underlying parameter w is uniformly sampled from the d-dimensional ball Bdp Bq of radius B, and the context features xti are drawn from a Bdp1q. The rewards are sampled from a uniform distribution in each round, i.e., rti Unifp0, 1q. This indicates that data is generated for the experiments, not taken from an existing public dataset for which access information would be provided.
Dataset Splits	No	The paper describes generating synthetic data for its experiments, rather than using pre-existing datasets with defined splits. It states: "In each instance, the true underlying parameter w is uniformly sampled from the d-dimensional ball Bdp Bq of radius B, and the context features xti are drawn from a Bdp1q. The rewards are sampled from a uniform distribution in each round, i.e., rti Unifp0, 1q."
Hardware Specification	No	The paper does not provide specific details about the hardware used to run the experiments. It does not mention any GPU or CPU models, or other hardware specifications.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers for reproducing the experiments.
Experiment Setup	Yes	We empirically evaluate the performance of our algorithms, OFU-MNL++ and OFU-M2NL, by measuring cumulative regret over T 3000 rounds. The algorithms are tested on 20 independent instances, and we report the average performance along with a shaded area representing two standard deviations. In each instance, the true underlying parameter w is uniformly sampled from the d-dimensional ball Bdp Bq of radius B, and the context features xti are drawn from a Bdp1q. The rewards are sampled from a uniform distribution in each round, i.e., rti Unifp0, 1q. We set the number of items to N 50 and the maximum assortment size to K 5. As an additional experiment, Figure G.1 presents results for a larger value of d, specifically d 10. In our experiments, since the threshold τt is too conservative in practice, we empirically tuned the hyperparameter τt for OFU-MNL++ by searching over a certain range of values while maintaining its inverse relationship with α (i.e., a higher τt corresponds to a lower α).