reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Calibrated Value-Aware Model Learning with Probabilistic Environment Models

Authors: Claas A Voelcker, Anastasiia Pedan, Arash Ahmadian, Romina Abachi, Igor Gilitschenski, Amir-Massoud Farahmand

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	This paper has two parts, each with a theoretical and empirical section. We answer two questions about the (m, b)-VAML family: (a) What variants of the (m, b)-VAML losses are well-calibrated to recover correct models and value functions? (b) Do we observe problems with uncalibrated losses when using standard architectures, especially deterministic latent-space models? ... To test our findings, we run the (m, b)-VAML losses on small, finite state Garnet problems (Bhatnagar et al., 2007). ... We examine the impact of the calibrated losses on a subset of DMC environments (Tunyasuvunakool et al., 2020) encompassing 7 total tasks across the humanoid and dog domains. ... We plot aggregated performance over 20 random seeds with 95% CI, estimated with stratified percentile bootstrap (Patterson et al., 2024).
Researcher Affiliation	Collaboration	1Department of Computer Science, University of Toronto, Canada 2Vector Institute, Toronto, Canada 3Igor Sikorsky Kyiv Polytechnic Institute, Kyiv, Ukraine 4Cohere, Toronto, Canada 5Ubisoft, Montreal, Canada 6Polytechnique Montreal, Canada 7MILA, Montreal, Canada.
Pseudocode	No	The paper does not contain any clearly labeled pseudocode or algorithm blocks. It primarily presents mathematical derivations and experimental results.
Open Source Code	Yes	Code is provided in https://github.com/adaptive-agents-lab/CVAML.
Open Datasets	Yes	To test our findings, we run the (m, b)-VAML losses on small, finite state Garnet problems (Bhatnagar et al., 2007). ... We examine the impact of the calibrated losses on a subset of DMC environments (Tunyasuvunakool et al., 2020) encompassing 7 total tasks across the humanoid and dog domains.
Dataset Splits	No	The paper describes generating environments (Garnet problems) and using interactive reinforcement learning environments (DMC environments) where data is collected through interaction, rather than using fixed datasets with predefined training/test/validation splits. It mentions using model-generated and real environment data in minibatches for training, but not fixed dataset splits for reproducibility.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory specifications) used for running the experiments. It only generally refers to resources in the acknowledgments without specifying models or types.
Software Dependencies	No	The paper mentions several algorithms and frameworks (e.g., Mu Zero loss, TD-MPC, RLiable library) but does not provide specific version numbers for any software dependencies needed to replicate the experiments.
Experiment Setup	Yes	Hyperparameters can be found in Table 2. Discount γ 0.99 Actor learning rate απ 0.0003 Critic learning rate αQ 0.0003 Model learning rate αˆp 0.0003 Encoder learning rate αφ 0.0001 Model rollout depth m 1 Model bootstrap depth b Varied (0 and 1) Model samples k Varied (1 and 4) Proportion real ρ 0.9 Latent dimension 512 Gradient clipping 10