Bayesian Methods for Constraint Inference in Reinforcement Learning
Authors: Dimitris Papadimitriou, Usman Anwar, Daniel S. Brown
TMLR 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show that BICRL outperforms pre-existing constraint learning approaches, leading to more accurate constraint inference and consequently safer policies. We carry out simulations in deterministic state space grid world environments to compare our method to the Greedy Iterative Constraint Inference (GICI) method proposed by Scobee & Sastry (2019). Table 1: False Positive, False Negative and Precision classification rates for GICI and BICRL for varying levels of transition dynamics noise. Results averaged over 10 runs. |
| Researcher Affiliation | Academia | Dimitris Papadimitriou EMAIL UC Berkeley Usman Anwar EMAIL University of Cambridge Daniel S. Brown EMAIL University of Utah |
| Pseudocode | Yes | Algorithm 1 BICRL Algorithm 2 Active Constraint Learning Algorithm 3 BDPR Algorithm 4 BCPR Algorithm 5 Feature-Based BICRL |
| Open Source Code | No | The paper does not contain an explicit statement about releasing source code or a link to a code repository. |
| Open Datasets | Yes | Figure 10 shows the floor plan of a single bedroom apartment obtained from the i Gibson dataset (Li et al., 2021). |
| Dataset Splits | No | The paper discusses using a certain number of expert demonstrations and evaluating generalization to a 'new unseen environment', but does not provide specific percentages or sample counts for training, validation, and test splits for any dataset in the traditional sense. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment. |
| Experiment Setup | Yes | Table 3: Hyperparameters of Sections 4.1-4.3 simulations. Hyperparameters Sec. 4.1 Sec. 4.2 Sec. 4.3 # Expert trajectories 100 100 20 n 80 80 80 γ 0.95 0.95 0.95 ϵ 0.0 0.0, 0.01, 0.05 0.0 β 1 1 1 K 2000 4000 200 σ 1 1 1 fr 50 50 50 |