Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]
Weighted model estimation for offline model-based reinforcement learning
Authors: Toru Hishinuma, Kei Senda
NeurIPS 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Numerical experiments demonstrate the effectiveness of weighting with the artificial weight. 6 Numerical Experiment |
| Researcher Affiliation | Academia | Toru Hishinuma Kyoto University EMAIL Kei Senda Kyoto University EMAIL |
| Pseudocode | Yes | Algorithm 1 Weighted model estimation for policy evaluation (full version). |
| Open Source Code | No | The paper mentions modifying existing code ('This paper implements SAC by modifying the implementation code by [36]') but does not explicitly state that the source code for their own methodology is made publicly available or provide a link to it. |
| Open Datasets | Yes | This paper studies policy optimization on the D4RL Benchmark [33] based on the Mu Jo Co simulator [34]. |
| Dataset Splits | No | The paper mentions using the D4RL Benchmark datasets but does not explicitly provide specific training/validation/test dataset splits, such as percentages or sample counts, within its text. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU or CPU models, or any other computer specifications used for running its experiments. |
| Software Dependencies | No | The paper mentions using PyTorch (implicitly via a reference to a PyTorch SAC implementation) and the MuJoCo simulator, but does not provide specific version numbers for these or any other software dependencies. |
| Experiment Setup | Yes | The agent uses Pθ represented by two-layer neural networks with 8 units with tanh activation. |