Behavior-agnostic Task Inference for Robust Offline In-context Reinforcement Learning
Authors: Long Ma, Fangwei Zhong, Yizhou Wang
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on Mu Jo Co environments demonstrate that BATI effectively interprets outof-distribution contexts and outperforms other methods, even in the presence of significant environmental noise. We conduct extensive experiments in several Mu Jo Co environments to evaluate the effectiveness of BATI. |
| Researcher Affiliation | Academia | 1Center for Data Science, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China 2Beijing Institute for General Artificial Intelligence, Beijing, China 3School of Artificial Intelligence, Beijing Normal University, Beijing, China 4School of Computer Science, Peking University, Beijing, China 5Institute for Artificial Intelligence, Peking University, Beijing, China 6State Key Laboratory of General Artificial Intelligence, Beijing, China. Correspondence to: Fangwei Zhong <EMAIL>. |
| Pseudocode | Yes | See Alg. 1 for the pseudocode of the full training pipeline. See Alg. 2 for the online evaluation procedure. |
| Open Source Code | No | Project page: https://sites.google.com/view/ bati-icrl. The paper provides a project page URL, but it does not explicitly state that this page hosts the source code for the methodology described in the paper. |
| Open Datasets | Yes | We choose five representative robot locomotion environments based on the Mu Jo Co simulator (Todorov et al., 2012) with varying properties and levels of difficulty. |
| Dataset Splits | Yes | In each evaluation environment, we randomly sample 20 tasks as ptrain(M) and another 20 tasks as ptest(M) according to its task parameterization... We split the 40 evaluation tasks in Ant Dir into different training and testing splits and reran the experiments. As shown in Tab. 5, across all splits, BATI achieves the best performance uniformly and is highly stable. |
| Hardware Specification | No | The paper mentions using "Mu Jo Co simulator" for environments but does not specify any hardware details like GPU models, CPU types, or memory used for running the experiments. |
| Software Dependencies | No | We implement BATI and the baselines on the same codebase with IQL (Kostrikov et al., 2022) as the base offline RL algorithm. Building on the codebase of the official implementation of UNICORN (Li et al., 2024), we implement BATI and all our baselines. The paper mentions various software components and frameworks but does not provide specific version numbers for any of them (e.g., IQL, UNICORN, CSRO, BRAC). |
| Experiment Setup | Yes | Table 3. Hyperparameters used in each of our evaluation environments. Parameter Name: Learning Rate, Batch Size, Task Contrastive Batch Size, IQL τ, IQL β, IQL Exp. Adv. Clip, # Gradient Steps, Episode Length, Dataset Size, Task Latent Dim, BATI # Latent Samples N, UNICORN Weight, CSRO CLUB Weight, CLUB Encoder Hidden Dims, Encoder Hidden Dims, Decoder Hidden Dims, RL Hidden Dims. |