Inverse Reinforcement Learning by Estimating Expertise of Demonstrators
Authors: Mark Beliaev, Ramtin Pedarsani
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments in both online and offline IL settings, with simulated and human-generated data, demonstrate IRLEED s adaptability and effectiveness, making it a versatile solution for learning from suboptimal demonstrations. [...] In this section we evaluate how IRLEED performs when learning from suboptimal demonstrations, using experiments in both online and offline IL settings, with simulated and human-generated data. |
| Researcher Affiliation | Academia | University of California, Santa Barbara EMAIL, EMAIL |
| Pseudocode | No | The paper describes the method using mathematical equations (Eq. 1-6) and prose, outlining the iterative approach and gradient computations (Eq. 8-10). However, there is no clearly labeled 'Algorithm' or 'Pseudocode' block with structured steps. |
| Open Source Code | Yes | Code https://github.com/mbeliaev1/IRLEED |
| Open Datasets | Yes | human-generated data... dataset B: collected data using adept human players (Kurin et al. 2017) |
| Dataset Splits | No | The paper mentions generating data, using a certain number of trajectories (e.g., 'collecting 40 trajectories from each policy') and seeds ('using 100 seeds for each setting', '30 seeds for each dataset setting'), but does not explicitly provide percentages or counts for training, validation, or test splits. It describes how the data was generated or used for runs, not how it was partitioned for different phases of model evaluation. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU models, CPU types, memory amounts) used for conducting the experiments. |
| Software Dependencies | No | The paper mentions utilizing 'codebases provided by the authors to implement ILEED and IQ' and creating 'IRLEED ontop of the IQ algorithm', but it does not specify any programming languages or library versions (e.g., Python, PyTorch, TensorFlow versions) that would be needed to replicate the experiments. |
| Experiment Setup | No | For further implementation details refer to the Appendix found in our extended version. |