reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

The Number of Trials Matters in Infinite-Horizon General-Utility Markov Decision Processes

Authors: Pedro Pinto Santos, Alberto Sardinha, Francisco S. Melo

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Third, we provide a set of empirical results to support our claims, highlighting how the number of trajectories and the structure of the underlying GUMDP influence policy evaluation. (from Abstract) and 4.3. Empirical results We now empirically assess the impact of different parameters in the mismatch between f K,H(π) and f (π) for arbitrary fixed π. ... In Fig. 2, a set of plots displays the average finite trials objective function, f( ˆd TK,H), in comparison to the infinite trials objective f(dπ), under GUMDP Mf,1.
Researcher Affiliation	Academia	1INESC-ID, Lisbon, Portugal 2Instituto Superior T ecnico, Lisbon, Portugal 3PUC-Rio, Rio de Janeiro, Brazil. All these are academic or research institutions.
Pseudocode	Yes	Algorithm 1 Estimating f K,H(π) via samples. 1: Inputs: N N (num. of iterations), K N (num. of trajectories), H N (trajectories horizon), and γ (discount factor). 2: ˆf0 = 0 3: for n in {1, . . . , N} do...
Open Source Code	Yes	The code used can be found in the following repository.
Open Datasets	No	Throughout this paper, we make use of the GUMDPs depicted in Fig. 1, which are representative of three common tasks in the convex RL literature. The paper describes the structure of these GUMDPs (Mf,1, Mf,2, Mf,3) but does not provide access information (link, citation, repository) for these or any other external datasets.
Dataset Splits	No	The paper discusses generating 'sampled trajectories' and varying the 'number of trials K' and 'trajectories length H' for simulations, rather than using static datasets with explicit training/test/validation splits. Therefore, it does not provide dataset split information in the conventional sense.
Hardware Specification	No	No specific hardware details (like GPU/CPU models, memory, or cloud instances) used for running the experiments were provided in the paper.
Software Dependencies	No	No specific software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions) needed to replicate the experiments were provided in the paper.
Experiment Setup	Yes	Under Mf,1, π(left\|s0) = π(right\|s0) = 0.5, and π(right\|s1) = π(left\|s2) = 1; for Mf,2 and Mf,3, π is uniformly random. We consider 100 random seeds and report 95% bootstrapped confidence intervals (shaded areas in plots). (from Section 4.3) and Algorithm 1 Estimating f K,H(π) via samples. 1: Inputs: N N (num. of iterations), K N (num. of trajectories), H N (trajectories horizon), and γ (discount factor).