reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Learning Utilities from Demonstrations in Markov Decision Processes

Authors: Filippo Lazzati, Alberto Maria Metelli

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	6. Numerical Simulations In this section, we present proof-of-concept experiments using data collected from lab members to provide empirical evidence to support both our model and algorithms.
Researcher Affiliation	Academia	1Politecnico di Milano, Milan, Italy. Correspondence to: Filippo Lazzati <EMAIL>.
Pseudocode	Yes	Algorithm 1 CATY-UL Input: data t DE i ui, threshold , utility U, discretization ϵ0, dynamics tppiui
Open Source Code	No	The paper does not contain any explicit statement about providing open-source code nor a link to a code repository.
Open Datasets	No	We asked to 15 participants to describe the actions they would play in an MDP with horizon H 5 (see Appendix F), at varying of the state, the stage, and the cumulative reward collected. The reward has a monetary interpretation. To answer the questions, the participants have been provided with complete information about the MDP.8 The data collected is not personal.
Dataset Splits	No	We asked to 15 participants to describe the actions they would play in an MDP... We consider the policy of the 10th participant (chosen arbitrarily) to the survey, and we execute TRACTOR-UL multiple times with varying values of the input parameters...
Hardware Specification	Yes	The experiment has been conducted in some hours on a personal computer with processor AMD Ryzen 5 5500U with Radeon Graphics (2.10 GHz), with 8,00 GB of RAM.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers for its methodology.
Experiment Setup	Yes	we always use K 10000 trajectories for estimating the return distribution of the 10th participant s policy, and the return distribution of the optimal policies computed along the way; we make 5 runs with each combination of parameters with different seeds. We execute for T 70 iterations using Lipschitz constant L 10... As initial utility U 0, we try Usqrt, Usquare, and Ulinear (see Appendix F.3), and as learning rates we try 0.01, 0.5, 5, 100, 1000, 10000.