reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Constrained episodic reinforcement learning in concave-convex and knapsack settings

Authors: Kianté Brantley, Miro Dudik, Thodoris Lykouris, Sobhan Miryoosefi, Max Simchowitz, Aleksandrs Slivkins, Wen Sun

NeurIPS 2020 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments demonstrate that the proposed algorithm signiﬁcantly outperforms these approaches in constrained episodic benchmarks.
Researcher Affiliation	Collaboration	Kianté Brantley University of Maryland EMAIL; Miroslav Dudík Microsoft Research EMAIL; Thodoris Lykouris Microsoft Research EMAIL; Sobhan Miryooseﬁ Princeton University EMAIL; Max Simchowitz UC Berkeley EMAIL; Aleksandrs Slivkins Microsoft Research EMAIL; Wen Sun Cornell University EMAIL
Pseudocode	No	The paper describes algorithms and their components (e.g., CONRL, CONPLANNER) and how to solve optimization problems as linear programs, but does not provide structured pseudocode or algorithm blocks.
Open Source Code	Yes	Code is available at https://github.com/miryoosefi/Con RL
Open Datasets	Yes	We run our experiments on two grid-world environments Mars rover (Tessler et al., 2019) and Box (Leike et al., 2017).
Dataset Splits	No	The paper describes running experiments on grid-world environments and training over a number of trajectories, but it does not specify traditional dataset splits (e.g., training, validation, test percentages or counts) as commonly seen in supervised learning.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU or CPU models used for running the experiments.
Software Dependencies	No	The paper does not specify software dependencies with version numbers, such as programming languages, libraries, or frameworks used for implementation.
Experiment Setup	Yes	The episode horizon H is 30 and the agent s action is perturbed with probability 0.1 to a random action. APPROPO focuses on the feasibility problem, so it requires to specify a lower bound on the reward, which we set to 0.3 for Mars rover and 0.1 for Box.