Pre-trained Language Models Improve the Few-shot Prompt Ability of Decision Transformer
Authors: Yu Yang, Pan Xu
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Comprehensive empirical studies demonstrate that initializing with a pre-trained language model provides the prior knowledge and achieves a similar performance with Prompt-DT under only 10% data in some Mu Jo Co control tasks. We also provide a thorough ablation study to validate the effectiveness of each component, including sequence modeling, language models, prompt regularizations, and prompt strategies. |
| Researcher Affiliation | Academia | Yu Yang EMAIL Department of Electrical and Computer Engineering Duke University Pan Xu EMAIL Department of Biostatistics and Bioinformatics Department of Computer Science Department of Electrical and Computer Engineering Duke University |
| Pseudocode | Yes | Algorithm 1 LPDT: Training and Inference |
| Open Source Code | Yes | All source code and experiment configurations are available at https://github.com/panxulab/LPDT. |
| Open Datasets | Yes | We conduct extensive experiments to assess the capability of our proposed framework in Mu Jo Co control environments (Fu et al., 2020) and Meta-World ML1 tasks (Yu et al., 2020). |
| Dataset Splits | Yes | We evaluate our approach over Mu Jo Co control tasks and Meta-World ML1 tasks. We split the tasks in these environments into the training set and the testing set. The tasks in Cheetah-dir and Ant-dir are split by directions. The tasks in Cheetah-vel are split by the goal velocities. The tasks in Point-robot are split by the goal positions which are uniformly distributed in a unit square. In Meta-World, the tasks are defined by different goal positions. The detailed task indexes can be found in Table 9. |
| Hardware Specification | Yes | All experiments are conducted on a single NVIDIA A6000 GPU with Intel Xeon Ice Lake Gold 5317 processors. |
| Software Dependencies | No | Table 10 lists 'Language initialization GPT-2' and 'Activation ReLU'. While GPT-2 is a specific model, it's not a software dependency with a version number like a programming language or library. 'Activation ReLU' is a function, not a software component. No other software components with specific version numbers are provided. |
| Experiment Setup | Yes | In this section, we show the hyperparameters of our LPDT conducted in Table 1. The hyperparameters have two parts which are the hyperparameters around the transformer and prompt regularization. We list these hyperparameters in Table 10. Table 10: Detail on hyperparameters used in our experiments in Table 1. We show that the hyperparameters in two parts which are parameters for model backbone and prompt regularization respectively. |