Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]

Model-Based Offline Planning

Authors: Arthur Argenson, Gabriel Dulac-Arnold

ICLR 2021 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show the performance of our algorithm, Model-Based Offline Planning (MBOP) on a series of robotics-inspired tasks, and demonstrate its ability to leverage planning to respect environmental constraints.
Researcher Affiliation Industry Arthur Argenson EMAIL Google Research Gabriel Dulac-Arnold EMAIL Google Research
Pseudocode Yes Algorithm 1 High-Level MBOP-Policy; Algorithm 2 MBOP-Trajopt
Open Source Code No The paper does not provide an explicit statement or link for open-sourcing the code for the described methodology. It only mentions that 'Accompanying videos are available here' and 'All non-standard datasets will be available publicly'.
Open Datasets Yes We use standard datasets from the RL Unplugged (RLU) (Gulcehre et al., 2020) and D4RL (Fu et al., 2020) papers.
Dataset Splits Yes On all datasets, training is performed on 90% of data and 10% is used for validation.
Hardware Specification Yes We calculate the average control frequency of MBOP on the RLU Walker task using a single Intel(R) Xeon(R) W-2135 CPU @ 3.70GHz core and a Nvidia 1080TI ... Execution speeds on the RLU Walker task in represented in Table 9. ... on an Tesla P100 using a single core of a Xeon 2200 MHz equivalent processor.
Software Dependencies No The paper describes the software components (e.g., neural networks) but does not provide specific version numbers for any libraries, frameworks, or environments used.
Experiment Setup Yes The full set of parameters for each experiment can be found in the Appendix Sec. 5.2. ... # FC Layers : 2 Size FC Layers : 500 # Ensemble Networks : 3 Learning Rate : 0.001 Batch Size : 512 # Epochs : 40