Calibrated Value-Aware Model Learning with Probabilistic Environment Models

Authors: Claas A Voelcker, Anastasiia Pedan, Arash Ahmadian, Romina Abachi, Igor Gilitschenski, Amir-Massoud Farahmand

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental This paper has two parts, each with a theoretical and empirical section. We answer two questions about the (m, b)-VAML family: (a) What variants of the (m, b)-VAML losses are well-calibrated to recover correct models and value functions? (b) Do we observe problems with uncalibrated losses when using standard architectures, especially deterministic latent-space models? ... To test our findings, we run the (m, b)-VAML losses on small, finite state Garnet problems (Bhatnagar et al., 2007). ... We examine the impact of the calibrated losses on a subset of DMC environments (Tunyasuvunakool et al., 2020) encompassing 7 total tasks across the humanoid and dog domains. ... We plot aggregated performance over 20 random seeds with 95% CI, estimated with stratified percentile bootstrap (Patterson et al., 2024).
Researcher Affiliation Collaboration 1Department of Computer Science, University of Toronto, Canada 2Vector Institute, Toronto, Canada 3Igor Sikorsky Kyiv Polytechnic Institute, Kyiv, Ukraine 4Cohere, Toronto, Canada 5Ubisoft, Montreal, Canada 6Polytechnique Montreal, Canada 7MILA, Montreal, Canada.
Pseudocode No The paper does not contain any clearly labeled pseudocode or algorithm blocks. It primarily presents mathematical derivations and experimental results.
Open Source Code Yes Code is provided in https://github.com/adaptive-agents-lab/CVAML.
Open Datasets Yes To test our findings, we run the (m, b)-VAML losses on small, finite state Garnet problems (Bhatnagar et al., 2007). ... We examine the impact of the calibrated losses on a subset of DMC environments (Tunyasuvunakool et al., 2020) encompassing 7 total tasks across the humanoid and dog domains.
Dataset Splits No The paper describes generating environments (Garnet problems) and using interactive reinforcement learning environments (DMC environments) where data is collected through interaction, rather than using fixed datasets with predefined training/test/validation splits. It mentions using model-generated and real environment data in minibatches for training, but not fixed dataset splits for reproducibility.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory specifications) used for running the experiments. It only generally refers to resources in the acknowledgments without specifying models or types.
Software Dependencies No The paper mentions several algorithms and frameworks (e.g., Mu Zero loss, TD-MPC, RLiable library) but does not provide specific version numbers for any software dependencies needed to replicate the experiments.
Experiment Setup Yes Hyperparameters can be found in Table 2. Discount γ 0.99 Actor learning rate απ 0.0003 Critic learning rate αQ 0.0003 Model learning rate αˆp 0.0003 Encoder learning rate αφ 0.0001 Model rollout depth m 1 Model bootstrap depth b Varied (0 and 1) Model samples k Varied (1 and 4) Proportion real ρ 0.9 Latent dimension 512 Gradient clipping 10