Computationally Efficient RL under Linear Bellman Completeness for Deterministic Dynamics

Authors: Runzhe Wu, Ayush Sekhari, Akshay Krishnamurthy, Wen Sun

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical Our work provides a computationally efficient algorithm for the linear Bellman complete setting that works for MDPs with large action spaces, random initial states, and random rewards but relies on the underlying dynamics to be deterministic. Our approach is based on randomization: we inject random noise into least squares regression problems to perform optimistic value iteration. Our key technical contribution is to carefully design the noise to only act in the null space of the training data to ensure optimism while circumventing a subtle error amplification issue.
Researcher Affiliation Collaboration Runzhe Wu Cornell University EMAIL Ayush Sekhari MIT EMAIL Akshay Krishnamurthy Microsoft Research EMAIL Wen Sun Cornell University EMAIL
Pseudocode Yes See Algorithm 1 for pseudocode. The input to the algorithm consists of three components. First, the noise variances, {σh}H h=1 and σR, control the scale of the random noise. Second, a D-optimal design (defined below) for the feature space. ... Algorithm 1 Null Space Randomization for Linear Bellman Completeness
Open Source Code No The paper does not contain any explicit statements about releasing source code for the described methodology, nor does it provide links to any code repositories in the main text or appendix.
Open Datasets No The paper is theoretical in nature, focusing on algorithms and regret bounds for Reinforcement Learning settings. It does not describe or utilize any specific datasets for empirical evaluation, hence no information about public dataset access is provided.
Dataset Splits No As a theoretical paper, no experimental evaluations involving datasets are conducted. Therefore, the paper does not specify any training/test/validation dataset splits.
Hardware Specification No The paper presents a theoretical algorithm and its analysis without conducting any empirical experiments. Consequently, no specific hardware specifications (e.g., GPU/CPU models, memory) are mentioned.
Software Dependencies No The paper focuses on theoretical algorithm design and analysis. It does not provide details about software dependencies, such as libraries or solvers with specific version numbers, that would be needed for implementation.
Experiment Setup No The paper is a theoretical work that provides an algorithm and its regret bounds. It does not include an experimental section, and thus, no details about hyperparameters, training configurations, or other system-level settings for experiments are provided.