reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Conservative State Value Estimation for Offline Reinforcement Learning

Authors: Liting Chen, Jie Yan, Zhengdao Shao, Lu Wang, Qingwei Lin, Saravanakumar Rajmohan, Thomas Moscibroda, Dongmei Zhang

NeurIPS 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate in classic continual control tasks of D4RL, showing that our method performs better than the conservative Q-function learning methods and is strongly competitive among recent SOTA methods. ... Experimental evaluation on continuous control tasks of Gym [7] and Adroit [8] in D4RL [9] benchmarks, showing that CSVE performs better than prior methods based on conservative Q-value estimation, and is strongly competitive among main SOTA algorithms.
Researcher Affiliation	Collaboration	Liting Chen Mc Gill University Montreal, Canada EMAIL Jie Yan Microsoft Beijing, China EMAIL Zhengdao Shao University of Sci. and Tech. of China Hefei, China EMAIL Lu Wang Microsoft Beijing, China EMAIL Qingwei Lin Microsoft Beijing, China EMAIL Saravan Rajmohan Microsoft 365 Seattle, USA EMAIL Thomas Moscibroda Microsoft Redmond, USA EMAIL Dongmei Zhang Microsoft Beijing, China EMAIL
Pseudocode	Yes	Algorithm 1 CSVE based Offline RL Algorithm
Open Source Code	Yes	We implement our method based on an offline deep reinforcement learning library d3rlpy [34]. The code is available at: https://github.com/2023Annonymous Author/csve .
Open Datasets	Yes	We conduct experimental evaluations on a variety of classic continuous control tasks of Gym[7] and Adroit[8] in the D4RL[9] benchmark. ... D4RL: Datasets for deep data-driven reinforcement learning. ar Xiv preprint ar Xiv:2004.07219, 2020.
Dataset Splits	No	The paper mentions 'train' and 'test' in the context of experiments but does not explicitly describe a validation dataset split or a methodology for it (e.g., percentages, sample counts, or cross-validation setup).
Hardware Specification	No	The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments.
Software Dependencies	No	The paper states 'We implement our method based on an offline deep reinforcement learning library d3rlpy [34]' but does not provide a specific version number for this library or any other software dependencies used in the experiments.
Experiment Setup	Yes	Table 3: Hyper-parameters of CSVE evaluation. B 5, number of ensembles in dynamics model; α 10, to control the penalty of OOD states; τ 10, budget parameter in Eq. 8; β In Gym domain, 3 for random and medium tasks, 0.1 for the other tasks; In Adroit domain, 30 for human and cloned tasks, 0.01 for expert tasks; γ 0.99, discount factor; H 1 million for Mujoco while 0.1 million for Adroit tasks; w 0.005, target network smoothing coefficient; lr of actor 3e-4, policy learning rate; lr of critic 1e-4, critic learning rate.