reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Meta-DT: Offline Meta-RL as Conditional Sequence Modeling with World Model Disentanglement

Authors: Zhi Wang, Li Zhang, Wenhao Wu, Yuanheng Zhu, Dongbin Zhao, Chunlin Chen

NeurIPS 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results on Mu Jo Co and Meta-World benchmarks across various dataset types show that Meta-DT exhibits superior few and zero-shot generalization capacity compared to strong baselines while being more practical with fewer prerequisites.
Researcher Affiliation	Academia	1Nanjing University 2Institution of Automation, Chinese Academy of Sciences EMAIL EMAIL EMAIL
Pseudocode	Yes	Appendix A. Algorithm Pesudocodes Based on the implementations in Sec. 4, this section gives the brief procedures of our method. First, Algorithm 1 presents the pretraining of the context-aware world model. Then, Algorithm 2 shows the pipeline of training Meta-DT, where the sub-procedure of generating the complementary prompt is given in Algorithm 3. Finally, Algorithm 4 and Algorithm 5 show the few-shot and zero-shot evaluations on test tasks, respectively.
Open Source Code	Yes	Our code is available at https://github.com/NJU-RL/Meta-DT.
Open Datasets	Yes	We evaluate all tested methods on three classical benchmarks in meta-RL: i) the 2D navigation environment Point-Robot [25]; ii) the multi-task Mu Jo Co control [55, 36], containing Cheetah-Vel, Cheetah-Dir, Ant-Dir, Hopper-Param, and Walker-Param; and iii) the Meta World manipulation platform [56], including Reach, Sweep, and Door-Lock.
Dataset Splits	No	For each environment, we randomly sample a distribution of tasks and divide them into the training set T train and test set T test. ... For the Point-Robot and Mu Jo Co environments, we sample 45 tasks for training and another 5 held-out tasks for testing. For Meta-World environments, we sample 15 tasks for training and 5 held-out tasks for testing. (No explicit validation set mentioned for their Meta-DT training, only train/test.)
Hardware Specification	Yes	We train our models on one Nvidia RTX4080 GPU with the Intel Core i9-10900X CPU and 256G RAM.
Software Dependencies	No	The paper mentions implementing Meta-DT based on the official DT codebase and notes optimizer (Adam) and other parameters, but it does not specify versions for key software dependencies like Python, PyTorch/TensorFlow, or CUDA.
Experiment Setup	Yes	Some common hyperparameters across all report units are set as: optimizer Adam, weight decay 1e-4, linear warmup steps for learning rate decay 10000, gradient norm clip 0.25, dropout 0.1, and batch size 128. Table 7 presents the detailed hyperparameters of Meta-DT trained on the Point-Robot and Mu Jo Co domains with the Medium, Expert, and Mixed datasets. Table 8 presents the detailed hyperparameters of Meta-DT trained on Meta-World environments with the Medium datasets.