reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Reward Distance Comparisons Under Transition Sparsity

Authors: Clement Nyanhongo, Bruno Miranda Henrique, Eugene Santos

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We provide theoretical justification for SRRD s robustness and conduct experiments to demonstrate its practical efficacy across multiple domains. ... Empirical results highlight SRRD s superior performance, as evidenced by its ability to find higher similarity between rewards generated from the same agents and higher variation between rewards from different agents.
Researcher Affiliation	Academia	Clement Nyanhongo EMAIL Thayer School of Engineering Dartmouth College Bruno Miranda Henrique EMAIL Thayer School of Engineering Dartmouth College Eugene Santos Jr. EMAIL Thayer School of Engineering Dartmouth College
Pseudocode	Yes	C.1 Experiment 1: Transition Sparsity Pseudocode Algorithm 1 Analyzing the effect of limited sampling on reward distance
Open Source Code	No	The paper does not provide concrete access to the source code for the methodology described in this paper. It only mentions third-party code used for Maxent and AIRL implementations: "Maxent and AIRL implementations adapted from: https://github.com/Human Compatible AI/imitation (Gleave et al., 2022)".
Open Datasets	Yes	Robomimic an open source dataset of robotics manipulation tasks incorporating both human and simulated demonstrations (Mandlekar et al., 2021), Montezuma s Revenge an Atari benchmark dataset with human demonstrations for the Montezuma s Revenge game (Kurin et al., 2017), Star Craft II a simulation of combat scenarios where a controlled multiagent team aims to defeat a default AI enemy team (Vinyals et al., 2019), ... and MIMIC-IV a real-world de-identified electronic health dataset for patients admitted at an emergency or intensive care unit at Beth Israel Deaconess Medical Center in Boston, MA (Johnson et al., 2023).
Dataset Splits	Yes	We select a training to test set ratio of 70 : 30, and repeat this experiment 200 times.
Hardware Specification	No	The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup	Yes	In this experiment, we train a k-nearest neighbors (k-NN) classifier to classify unlabeled agent trajectories by indirectly using computed rewards, to identify the agents that produced these trajectories. ... grid-search is used to identify candidate values for k and γ, and twofold cross-validation (using Rtrain) is used to optimize hyper-parameters based on accuracy. ... We select a training to test set ratio of 70 : 30, and repeat this experiment 200 times. ... Table 9: Reward Learning Parameters Across Domains AIRL MAXENT PTIRL Trajectories/run: 5 Trajectories/run: 5 Target Trajectories/run: 5 RL Algorithm: PPO RL Algorithm: PPO Non-Target Trajectories/run: 10 Discount (γ): 0.9 Discount (γ): 0.9 Max Reward Cap: +100 Reward Network MLP Hidden Size: [256, 128] Reward Network MLP Hidden Size: [256, 128] Min Reward Cap: -100 Learning Rate: 10-4 Learning Rate: 10-4 LP Solver: Cplex Time Steps: 105 Generator Batch Size: 2048 Discriminator Batch Size: 256.