Provably Efficient Reinforcement Learning with Linear Function Approximation under Adaptivity Constraints

Authors: Tianhao Wang, Dongruo Zhou, Quanquan Gu

NeurIPS 2021 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In Section 6 we present the numerical experiment which supports our theory.
Researcher Affiliation Academia Tianhao Wang Department of Statistics and Data Science Yale University New Haven, CT 06511 EMAIL Dongruo Zhou Department of Computer Science University of California, Los Angeles Los Angeles, CA 90095 EMAIL Quanquan Gu Department of Computer Science University of California, Los Angeles Los Angeles, CA 90095 EMAIL
Pseudocode Yes Algorithm 1 LSVI-UCB-Batch
Open Source Code No (a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [No]
Open Datasets No We run our algorithms, LSVI-UCB-Batch and LSVI-UCB-Rare Switch, on a synthetic linear MDP given in Example 6.1, and compare them with the fully adaptive baseline, LSVI-UCB (Jin et al., 2020).
Dataset Splits No The paper uses a synthetic MDP and evaluates performance using regret over episodes; it does not describe dataset splits like training, validation, or test sets.
Hardware Specification Yes All experiments are performed on a PC with Intel i7-9700K CPU.
Software Dependencies No The paper does not specify any software dependencies with version numbers.
Experiment Setup Yes In our experiment, we set H = 10, K = 2500, δ = 0.35 and d = 13, thus A contains 1024 actions. [...] In detail, for LSVI-UCB-Batch, we run the algorithm for B = 10, 20, 30, 40, 50 respectively; for LSVI-UCB-Rare Switch, we set η = 2, 4, 8, 16, 32.