Entropy Regularized Task Representation Learning for Offline Meta-Reinforcement Learning
Authors: Mohammadreza Nakhaeinezhadfard, Aidan Scannell, Joni Pajarinen
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate our approach in Mu Jo Co environments, showing that compared to baselines, our task representations more faithfully represent the underlying tasks, leading to outperforming prior methods in both in-distribution and out-of-distribution tasks. |
| Researcher Affiliation | Academia | 1Department of Electrical Engineering and Automation (EEA), Aalto University, Finland 2Finnish Center for Artifcial Intelligence, Finland EMAIL |
| Pseudocode | Yes | Algorithm 1: Meta-training |
| Open Source Code | Yes | Code https://github.com/Mohammadreza Nakhaei/ER-TRL |
| Open Datasets | No | Offline Datasets We used soft actor-critic (SAC, Haarnoja et al. 2018) to generate the datasets and trained each agent to an expert level. The dataset consists of trajectories collected from rolling out the corresponding SAC agent at different training stages; each dataset contains 180k transitions. While the paper mentions using MuJoCo environments (Todorov, Erez, and Tassa 2012) and generating datasets, it does not provide concrete access information (link, DOI, specific repository, or citation) for the generated datasets themselves to be considered publicly available. |
| Dataset Splits | Yes | In each environment, we consider 20 tasks for training, 10 tasks for in-distribution testing, and 10 tasks for out-of-distribution testing. For each in-distribution and out-of-distribution test task, we sample 1000 transitions and embed them in task representations with the trained context encoder. We then use 80% of the samples for training and 20% for testing. |
| Hardware Specification | No | Acknowledgements We acknowledge CSC IT Center for Science, Finland, for awarding this project access to the LUMI supercomputer, owned by the Euro HPC Joint Undertaking, hosted by CSC (Finland) and the LUMI consortium through CSC. We acknowledge the computational resources provided by the Aalto Science-IT project. The paper mentions the LUMI supercomputer but does not provide specific hardware details such as CPU/GPU models, memory, or other specifications within that system. |
| Software Dependencies | No | The paper mentions using 'soft actor-critic (SAC, Haarnoja et al. 2018)' and 'BRAC (Wu, Tucker, and Nachum 2019)' as well as 'Mu Jo Co (Todorov, Erez, and Tassa 2012)' but does not provide specific version numbers for any software libraries, frameworks, or programming languages used in the implementation. |
| Experiment Setup | Yes | The average normalized return for out-of-distribution test tasks after 100k training steps, averaged over 5 random seeds, represents standard deviation. In each environment, we consider 20 tasks for training, 10 tasks for in-distribution testing, and 10 tasks for out-of-distribution testing. For each in-distribution and out-of-distribution test task, we sample 1000 transitions and embed them in task representations with the trained context encoder. We then use 80% of the samples for training and 20% for testing. The dataset consists of trajectories collected from rolling out the corresponding SAC agent at different training stages; each dataset contains 180k transitions. |