reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Imitation Learning via Focused Satisficing

Authors: Rushit N. Shah, Nikolaos Agadakos, Synthia Sasulski, Ali Farajzadeh, Sanjiban Choudhury, Brian Ziebart

IJCAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct experiments using a mix of simple, classic control environments (cartpole, lunarlander) and complex robotics environments (Mujoco hopper, halfcheetah, walker) from Open AI Gym [Brockman et al., 2016]. For each environment, we obtain 100 demonstrations from a suboptimal policy learned using PPO. This ensures that the majority of the resulting demonstrations are suboptimal and noisy. Human demonstrations for the lunarlander used in Section 3.7 are collected from nonexpert, human players using the joysticks on an XBox 360 video game controller. Demonstration return statistics for environment-specific demonstration sets of varying quality are provided in Table 1.
Researcher Affiliation	Academia	1Department of Computer Science, University of Illinois Chicago 2Department of Computer Science, Cornell University EMAIL, EMAIL, EMAIL
Pseudocode	Yes	Algorithm 1 Online subdominance policy gradient... Algorithm 2 Snippet-based subdominance policy gradient... Algorithm 3 Offline, joint stochastic optimization
Open Source Code	No	The paper does not provide an explicit statement about the release of their own source code or a direct link to a code repository for the methodology described. It mentions using Stable Baselines3, which is a third-party tool.
Open Datasets	Yes	We conduct experiments using a mix of simple, classic control environments (cartpole, lunarlander) and complex robotics environments (Mujoco hopper, halfcheetah, walker) from Open AI Gym [Brockman et al., 2016].
Dataset Splits	Yes	We sort all demonstrations by their total (true) return and then choose a subset by retaining the best or worst 90%, 80%, 70%, or 60% of the original set. We use this demonstration subset to train T-REX and Online Min Sub FI.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU models, CPU types, or memory amounts used for running the experiments.
Software Dependencies	No	We implement the policy optimization of Min Sub FI using Stable Baselines3 [Raffin et al., 2021]. Across all experiments, all baseline methods use the same base policy model paired with Stable Baseline3 s implementation of the PPO algorithm [Schulman et al., 2017]. The paper mentions software packages like Stable Baselines3 and Open AI Gym, but does not provide specific version numbers for them within the text.
Experiment Setup	Yes	The experiments are not based on extensive hyperparameter tuning; rather, all policy networks use nearly the same hyperparameters (Table 2). Table 2: Values of PPO hyperparameters for each environment. cartpole learning rate 1e-4, entropy coeff 0, miniclip 0.2, total batch 512, horizon 2048, epochs 10, range steps 2e6.