reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Inverse Reinforcement Learning by Estimating Expertise of Demonstrators

Authors: Mark Beliaev, Ramtin Pedarsani

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments in both online and offline IL settings, with simulated and human-generated data, demonstrate IRLEED s adaptability and effectiveness, making it a versatile solution for learning from suboptimal demonstrations. [...] In this section we evaluate how IRLEED performs when learning from suboptimal demonstrations, using experiments in both online and offline IL settings, with simulated and human-generated data.
Researcher Affiliation	Academia	University of California, Santa Barbara EMAIL, EMAIL
Pseudocode	No	The paper describes the method using mathematical equations (Eq. 1-6) and prose, outlining the iterative approach and gradient computations (Eq. 8-10). However, there is no clearly labeled 'Algorithm' or 'Pseudocode' block with structured steps.
Open Source Code	Yes	Code https://github.com/mbeliaev1/IRLEED
Open Datasets	Yes	human-generated data... dataset B: collected data using adept human players (Kurin et al. 2017)
Dataset Splits	No	The paper mentions generating data, using a certain number of trajectories (e.g., 'collecting 40 trajectories from each policy') and seeds ('using 100 seeds for each setting', '30 seeds for each dataset setting'), but does not explicitly provide percentages or counts for training, validation, or test splits. It describes how the data was generated or used for runs, not how it was partitioned for different phases of model evaluation.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., GPU models, CPU types, memory amounts) used for conducting the experiments.
Software Dependencies	No	The paper mentions utilizing 'codebases provided by the authors to implement ILEED and IQ' and creating 'IRLEED ontop of the IQ algorithm', but it does not specify any programming languages or library versions (e.g., Python, PyTorch, TensorFlow versions) that would be needed to replicate the experiments.
Experiment Setup	No	For further implementation details refer to the Appendix found in our extended version.