Leveraging Offline Data in Linear Latent Contextual Bandits
Authors: Chinmaya Kausik, Kevin Tan, Ambuj Tewari
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We also establish the efficacy of our methods using experiments on both synthetic data and real-life movie recommendation data from Movie Lens. ... Experiments: We establish the efficacy of our algorithms outlined above through a simulation study and a demonstration on a real recommendation problem with the Movie Lens-1M (Harper and Konstan, 2015) dataset. |
| Researcher Affiliation | Academia | 1Department of Statistics, University of Michigan, USA 2Department of Statistics and Data Science, The Wharton School, University of Pennsylvania, USA. |
| Pseudocode | Yes | Algorithm 1 Subspace estimation from Offline Latent bandit Data (SOLD) Algorithm 2 Latent Offline subspace Constraints for Accelerating Linear UCB (LOCAL-UCB) Algorithm 3 Projection and Bonuses for Accelerating Latent bandit Linear UCB (Pro BALL-UCB) |
| Open Source Code | Yes | 6See https://github.com/hetankevin/probono for source code. |
| Open Datasets | Yes | real-life movie recommendation data from Movie Lens. ... Movie Lens-1M (Harper and Konstan, 2015) dataset. |
| Dataset Splits | No | We generate U with Uij i.i.d. Unif(0, 2.5 d Kd A ). We simulate the hidden labels θn N(0, d 1 K Id K), generate feature vectors ϕ(xn,h, an,h) N(0, Id A) normalized to unit norm, and sample noise ϵn,h i.i.d. N(0, 0.52). We use SOLD to estimate ˆU from the offline dataset Doff, which consists of 5000 trajectories of length 20 each. ... we filter the dataset to include only movies rated by at least 200 users and vice-versa. We factor the sparse rating matrix into user parameters β and movie features Φ using the probabilistic matrix factorization algorithm... The subspace was estimated from 5000 trajectories of length 50 simulated from the reward model and the uniform behavior policy. |
| Hardware Specification | Yes | All experiments were run on a single computer with an Intel i9-13900k CPU, 128GB of RAM, and a NVIDIA RTX 3090 GPU, in no more than an hour in total. |
| Software Dependencies | No | The paper mentions algorithms and methods (e.g., Lin UCB, k-means, probabilistic matrix factorization) but does not specify software libraries or frameworks with version numbers used for implementation. |
| Experiment Setup | Yes | In accordance with the confidence set determined by (Li et al., 2010), we choose α1,t = 0.33 p d K log(1 + 10T/d K) and α2,t = 0.33 p d A log(1 + 10T/d A), and share the Lin UCB and Pro BALL-UCB hyperparameters by assigning αt = α2,t. ... We use a simpler expression for off, set τ = 0, and choose a suitable value of the hyperparameter τ to adjust for overly conservative off 7. We later vary τ in ablation experiments to demonstrate that our results are not a consequence of our choice of hyperparameters. |