reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Entropy Regularized Task Representation Learning for Offline Meta-Reinforcement Learning

Authors: Mohammadreza Nakhaeinezhadfard, Aidan Scannell, Joni Pajarinen

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We validate our approach in Mu Jo Co environments, showing that compared to baselines, our task representations more faithfully represent the underlying tasks, leading to outperforming prior methods in both in-distribution and out-of-distribution tasks.
Researcher Affiliation	Academia	1Department of Electrical Engineering and Automation (EEA), Aalto University, Finland 2Finnish Center for Artifcial Intelligence, Finland EMAIL
Pseudocode	Yes	Algorithm 1: Meta-training
Open Source Code	Yes	Code https://github.com/Mohammadreza Nakhaei/ER-TRL
Open Datasets	No	Offline Datasets We used soft actor-critic (SAC, Haarnoja et al. 2018) to generate the datasets and trained each agent to an expert level. The dataset consists of trajectories collected from rolling out the corresponding SAC agent at different training stages; each dataset contains 180k transitions. While the paper mentions using MuJoCo environments (Todorov, Erez, and Tassa 2012) and generating datasets, it does not provide concrete access information (link, DOI, specific repository, or citation) for the generated datasets themselves to be considered publicly available.
Dataset Splits	Yes	In each environment, we consider 20 tasks for training, 10 tasks for in-distribution testing, and 10 tasks for out-of-distribution testing. For each in-distribution and out-of-distribution test task, we sample 1000 transitions and embed them in task representations with the trained context encoder. We then use 80% of the samples for training and 20% for testing.
Hardware Specification	No	Acknowledgements We acknowledge CSC IT Center for Science, Finland, for awarding this project access to the LUMI supercomputer, owned by the Euro HPC Joint Undertaking, hosted by CSC (Finland) and the LUMI consortium through CSC. We acknowledge the computational resources provided by the Aalto Science-IT project. The paper mentions the LUMI supercomputer but does not provide specific hardware details such as CPU/GPU models, memory, or other specifications within that system.
Software Dependencies	No	The paper mentions using 'soft actor-critic (SAC, Haarnoja et al. 2018)' and 'BRAC (Wu, Tucker, and Nachum 2019)' as well as 'Mu Jo Co (Todorov, Erez, and Tassa 2012)' but does not provide specific version numbers for any software libraries, frameworks, or programming languages used in the implementation.
Experiment Setup	Yes	The average normalized return for out-of-distribution test tasks after 100k training steps, averaged over 5 random seeds, represents standard deviation. In each environment, we consider 20 tasks for training, 10 tasks for in-distribution testing, and 10 tasks for out-of-distribution testing. For each in-distribution and out-of-distribution test task, we sample 1000 transitions and embed them in task representations with the trained context encoder. We then use 80% of the samples for training and 20% for testing. The dataset consists of trajectories collected from rolling out the corresponding SAC agent at different training stages; each dataset contains 180k transitions.