reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Computationally Efficient RL under Linear Bellman Completeness for Deterministic Dynamics

Authors: Runzhe Wu, Ayush Sekhari, Akshay Krishnamurthy, Wen Sun

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	Our work provides a computationally efficient algorithm for the linear Bellman complete setting that works for MDPs with large action spaces, random initial states, and random rewards but relies on the underlying dynamics to be deterministic. Our approach is based on randomization: we inject random noise into least squares regression problems to perform optimistic value iteration. Our key technical contribution is to carefully design the noise to only act in the null space of the training data to ensure optimism while circumventing a subtle error amplification issue.
Researcher Affiliation	Collaboration	Runzhe Wu Cornell University EMAIL Ayush Sekhari MIT EMAIL Akshay Krishnamurthy Microsoft Research EMAIL Wen Sun Cornell University EMAIL
Pseudocode	Yes	See Algorithm 1 for pseudocode. The input to the algorithm consists of three components. First, the noise variances, {σh}H h=1 and σR, control the scale of the random noise. Second, a D-optimal design (defined below) for the feature space. ... Algorithm 1 Null Space Randomization for Linear Bellman Completeness
Open Source Code	No	The paper does not contain any explicit statements about releasing source code for the described methodology, nor does it provide links to any code repositories in the main text or appendix.
Open Datasets	No	The paper is theoretical in nature, focusing on algorithms and regret bounds for Reinforcement Learning settings. It does not describe or utilize any specific datasets for empirical evaluation, hence no information about public dataset access is provided.
Dataset Splits	No	As a theoretical paper, no experimental evaluations involving datasets are conducted. Therefore, the paper does not specify any training/test/validation dataset splits.
Hardware Specification	No	The paper presents a theoretical algorithm and its analysis without conducting any empirical experiments. Consequently, no specific hardware specifications (e.g., GPU/CPU models, memory) are mentioned.
Software Dependencies	No	The paper focuses on theoretical algorithm design and analysis. It does not provide details about software dependencies, such as libraries or solvers with specific version numbers, that would be needed for implementation.
Experiment Setup	No	The paper is a theoretical work that provides an algorithm and its regret bounds. It does not include an experimental section, and thus, no details about hyperparameters, training configurations, or other system-level settings for experiments are provided.