Calibrated Value-Aware Model Learning with Probabilistic Environment Models
Authors: Claas A Voelcker, Anastasiia Pedan, Arash Ahmadian, Romina Abachi, Igor Gilitschenski, Amir-Massoud Farahmand
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | This paper has two parts, each with a theoretical and empirical section. We answer two questions about the (m, b)-VAML family: (a) What variants of the (m, b)-VAML losses are well-calibrated to recover correct models and value functions? (b) Do we observe problems with uncalibrated losses when using standard architectures, especially deterministic latent-space models? ... To test our findings, we run the (m, b)-VAML losses on small, finite state Garnet problems (Bhatnagar et al., 2007). ... We examine the impact of the calibrated losses on a subset of DMC environments (Tunyasuvunakool et al., 2020) encompassing 7 total tasks across the humanoid and dog domains. ... We plot aggregated performance over 20 random seeds with 95% CI, estimated with stratified percentile bootstrap (Patterson et al., 2024). |
| Researcher Affiliation | Collaboration | 1Department of Computer Science, University of Toronto, Canada 2Vector Institute, Toronto, Canada 3Igor Sikorsky Kyiv Polytechnic Institute, Kyiv, Ukraine 4Cohere, Toronto, Canada 5Ubisoft, Montreal, Canada 6Polytechnique Montreal, Canada 7MILA, Montreal, Canada. |
| Pseudocode | No | The paper does not contain any clearly labeled pseudocode or algorithm blocks. It primarily presents mathematical derivations and experimental results. |
| Open Source Code | Yes | Code is provided in https://github.com/adaptive-agents-lab/CVAML. |
| Open Datasets | Yes | To test our findings, we run the (m, b)-VAML losses on small, finite state Garnet problems (Bhatnagar et al., 2007). ... We examine the impact of the calibrated losses on a subset of DMC environments (Tunyasuvunakool et al., 2020) encompassing 7 total tasks across the humanoid and dog domains. |
| Dataset Splits | No | The paper describes generating environments (Garnet problems) and using interactive reinforcement learning environments (DMC environments) where data is collected through interaction, rather than using fixed datasets with predefined training/test/validation splits. It mentions using model-generated and real environment data in minibatches for training, but not fixed dataset splits for reproducibility. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory specifications) used for running the experiments. It only generally refers to resources in the acknowledgments without specifying models or types. |
| Software Dependencies | No | The paper mentions several algorithms and frameworks (e.g., Mu Zero loss, TD-MPC, RLiable library) but does not provide specific version numbers for any software dependencies needed to replicate the experiments. |
| Experiment Setup | Yes | Hyperparameters can be found in Table 2. Discount γ 0.99 Actor learning rate απ 0.0003 Critic learning rate αQ 0.0003 Model learning rate αˆp 0.0003 Encoder learning rate αφ 0.0001 Model rollout depth m 1 Model bootstrap depth b Varied (0 and 1) Model samples k Varied (1 and 4) Proportion real ρ 0.9 Latent dimension 512 Gradient clipping 10 |