The Number of Trials Matters in Infinite-Horizon General-Utility Markov Decision Processes
Authors: Pedro Pinto Santos, Alberto Sardinha, Francisco S. Melo
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Third, we provide a set of empirical results to support our claims, highlighting how the number of trajectories and the structure of the underlying GUMDP influence policy evaluation. (from Abstract) and 4.3. Empirical results We now empirically assess the impact of different parameters in the mismatch between f K,H(π) and f (π) for arbitrary fixed π. ... In Fig. 2, a set of plots displays the average finite trials objective function, f( ˆd TK,H), in comparison to the infinite trials objective f(dπ), under GUMDP Mf,1. |
| Researcher Affiliation | Academia | 1INESC-ID, Lisbon, Portugal 2Instituto Superior T ecnico, Lisbon, Portugal 3PUC-Rio, Rio de Janeiro, Brazil. All these are academic or research institutions. |
| Pseudocode | Yes | Algorithm 1 Estimating f K,H(π) via samples. 1: Inputs: N N (num. of iterations), K N (num. of trajectories), H N (trajectories horizon), and γ (discount factor). 2: ˆf0 = 0 3: for n in {1, . . . , N} do... |
| Open Source Code | Yes | The code used can be found in the following repository. |
| Open Datasets | No | Throughout this paper, we make use of the GUMDPs depicted in Fig. 1, which are representative of three common tasks in the convex RL literature. The paper describes the structure of these GUMDPs (Mf,1, Mf,2, Mf,3) but does not provide access information (link, citation, repository) for these or any other external datasets. |
| Dataset Splits | No | The paper discusses generating 'sampled trajectories' and varying the 'number of trials K' and 'trajectories length H' for simulations, rather than using static datasets with explicit training/test/validation splits. Therefore, it does not provide dataset split information in the conventional sense. |
| Hardware Specification | No | No specific hardware details (like GPU/CPU models, memory, or cloud instances) used for running the experiments were provided in the paper. |
| Software Dependencies | No | No specific software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions) needed to replicate the experiments were provided in the paper. |
| Experiment Setup | Yes | Under Mf,1, π(left|s0) = π(right|s0) = 0.5, and π(right|s1) = π(left|s2) = 1; for Mf,2 and Mf,3, π is uniformly random. We consider 100 random seeds and report 95% bootstrapped confidence intervals (shaded areas in plots). (from Section 4.3) and Algorithm 1 Estimating f K,H(π) via samples. 1: Inputs: N N (num. of iterations), K N (num. of trajectories), H N (trajectories horizon), and γ (discount factor). |