Bayesian Methods for Constraint Inference in Reinforcement Learning

Authors: Dimitris Papadimitriou, Usman Anwar, Daniel S. Brown

TMLR 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show that BICRL outperforms pre-existing constraint learning approaches, leading to more accurate constraint inference and consequently safer policies. We carry out simulations in deterministic state space grid world environments to compare our method to the Greedy Iterative Constraint Inference (GICI) method proposed by Scobee & Sastry (2019). Table 1: False Positive, False Negative and Precision classification rates for GICI and BICRL for varying levels of transition dynamics noise. Results averaged over 10 runs.
Researcher Affiliation Academia Dimitris Papadimitriou EMAIL UC Berkeley Usman Anwar EMAIL University of Cambridge Daniel S. Brown EMAIL University of Utah
Pseudocode Yes Algorithm 1 BICRL Algorithm 2 Active Constraint Learning Algorithm 3 BDPR Algorithm 4 BCPR Algorithm 5 Feature-Based BICRL
Open Source Code No The paper does not contain an explicit statement about releasing source code or a link to a code repository.
Open Datasets Yes Figure 10 shows the floor plan of a single bedroom apartment obtained from the i Gibson dataset (Li et al., 2021).
Dataset Splits No The paper discusses using a certain number of expert demonstrations and evaluating generalization to a 'new unseen environment', but does not provide specific percentages or sample counts for training, validation, and test splits for any dataset in the traditional sense.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup Yes Table 3: Hyperparameters of Sections 4.1-4.3 simulations. Hyperparameters Sec. 4.1 Sec. 4.2 Sec. 4.3 # Expert trajectories 100 100 20 n 80 80 80 γ 0.95 0.95 0.95 ϵ 0.0 0.0, 0.01, 0.05 0.0 β 1 1 1 K 2000 4000 200 σ 1 1 1 fr 50 50 50