reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Leveraging Offline Data in Linear Latent Contextual Bandits

Authors: Chinmaya Kausik, Kevin Tan, Ambuj Tewari

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We also establish the efficacy of our methods using experiments on both synthetic data and real-life movie recommendation data from Movie Lens. ... Experiments: We establish the efficacy of our algorithms outlined above through a simulation study and a demonstration on a real recommendation problem with the Movie Lens-1M (Harper and Konstan, 2015) dataset.
Researcher Affiliation	Academia	1Department of Statistics, University of Michigan, USA 2Department of Statistics and Data Science, The Wharton School, University of Pennsylvania, USA.
Pseudocode	Yes	Algorithm 1 Subspace estimation from Offline Latent bandit Data (SOLD) Algorithm 2 Latent Offline subspace Constraints for Accelerating Linear UCB (LOCAL-UCB) Algorithm 3 Projection and Bonuses for Accelerating Latent bandit Linear UCB (Pro BALL-UCB)
Open Source Code	Yes	6See https://github.com/hetankevin/probono for source code.
Open Datasets	Yes	real-life movie recommendation data from Movie Lens. ... Movie Lens-1M (Harper and Konstan, 2015) dataset.
Dataset Splits	No	We generate U with Uij i.i.d. Unif(0, 2.5 d Kd A ). We simulate the hidden labels θn N(0, d 1 K Id K), generate feature vectors ϕ(xn,h, an,h) N(0, Id A) normalized to unit norm, and sample noise ϵn,h i.i.d. N(0, 0.52). We use SOLD to estimate ˆU from the offline dataset Doff, which consists of 5000 trajectories of length 20 each. ... we filter the dataset to include only movies rated by at least 200 users and vice-versa. We factor the sparse rating matrix into user parameters β and movie features Φ using the probabilistic matrix factorization algorithm... The subspace was estimated from 5000 trajectories of length 50 simulated from the reward model and the uniform behavior policy.
Hardware Specification	Yes	All experiments were run on a single computer with an Intel i9-13900k CPU, 128GB of RAM, and a NVIDIA RTX 3090 GPU, in no more than an hour in total.
Software Dependencies	No	The paper mentions algorithms and methods (e.g., Lin UCB, k-means, probabilistic matrix factorization) but does not specify software libraries or frameworks with version numbers used for implementation.
Experiment Setup	Yes	In accordance with the confidence set determined by (Li et al., 2010), we choose α1,t = 0.33 p d K log(1 + 10T/d K) and α2,t = 0.33 p d A log(1 + 10T/d A), and share the Lin UCB and Pro BALL-UCB hyperparameters by assigning αt = α2,t. ... We use a simpler expression for off, set τ = 0, and choose a suitable value of the hyperparameter τ to adjust for overly conservative off 7. We later vary τ in ablation experiments to demonstrate that our results are not a consequence of our choice of hyperparameters.