Leveraging Offline Data in Linear Latent Contextual Bandits

Authors: Chinmaya Kausik, Kevin Tan, Ambuj Tewari

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We also establish the efficacy of our methods using experiments on both synthetic data and real-life movie recommendation data from Movie Lens. ... Experiments: We establish the efficacy of our algorithms outlined above through a simulation study and a demonstration on a real recommendation problem with the Movie Lens-1M (Harper and Konstan, 2015) dataset.
Researcher Affiliation Academia 1Department of Statistics, University of Michigan, USA 2Department of Statistics and Data Science, The Wharton School, University of Pennsylvania, USA.
Pseudocode Yes Algorithm 1 Subspace estimation from Offline Latent bandit Data (SOLD) Algorithm 2 Latent Offline subspace Constraints for Accelerating Linear UCB (LOCAL-UCB) Algorithm 3 Projection and Bonuses for Accelerating Latent bandit Linear UCB (Pro BALL-UCB)
Open Source Code Yes 6See https://github.com/hetankevin/probono for source code.
Open Datasets Yes real-life movie recommendation data from Movie Lens. ... Movie Lens-1M (Harper and Konstan, 2015) dataset.
Dataset Splits No We generate U with Uij i.i.d. Unif(0, 2.5 d Kd A ). We simulate the hidden labels θn N(0, d 1 K Id K), generate feature vectors ϕ(xn,h, an,h) N(0, Id A) normalized to unit norm, and sample noise ϵn,h i.i.d. N(0, 0.52). We use SOLD to estimate ˆU from the offline dataset Doff, which consists of 5000 trajectories of length 20 each. ... we filter the dataset to include only movies rated by at least 200 users and vice-versa. We factor the sparse rating matrix into user parameters β and movie features Φ using the probabilistic matrix factorization algorithm... The subspace was estimated from 5000 trajectories of length 50 simulated from the reward model and the uniform behavior policy.
Hardware Specification Yes All experiments were run on a single computer with an Intel i9-13900k CPU, 128GB of RAM, and a NVIDIA RTX 3090 GPU, in no more than an hour in total.
Software Dependencies No The paper mentions algorithms and methods (e.g., Lin UCB, k-means, probabilistic matrix factorization) but does not specify software libraries or frameworks with version numbers used for implementation.
Experiment Setup Yes In accordance with the confidence set determined by (Li et al., 2010), we choose α1,t = 0.33 p d K log(1 + 10T/d K) and α2,t = 0.33 p d A log(1 + 10T/d A), and share the Lin UCB and Pro BALL-UCB hyperparameters by assigning αt = α2,t. ... We use a simpler expression for off, set τ = 0, and choose a suitable value of the hyperparameter τ to adjust for overly conservative off 7. We later vary τ in ablation experiments to demonstrate that our results are not a consequence of our choice of hyperparameters.