Behavior-agnostic Task Inference for Robust Offline In-context Reinforcement Learning

Authors: Long Ma, Fangwei Zhong, Yizhou Wang

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on Mu Jo Co environments demonstrate that BATI effectively interprets outof-distribution contexts and outperforms other methods, even in the presence of significant environmental noise. We conduct extensive experiments in several Mu Jo Co environments to evaluate the effectiveness of BATI.
Researcher Affiliation Academia 1Center for Data Science, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China 2Beijing Institute for General Artificial Intelligence, Beijing, China 3School of Artificial Intelligence, Beijing Normal University, Beijing, China 4School of Computer Science, Peking University, Beijing, China 5Institute for Artificial Intelligence, Peking University, Beijing, China 6State Key Laboratory of General Artificial Intelligence, Beijing, China. Correspondence to: Fangwei Zhong <EMAIL>.
Pseudocode Yes See Alg. 1 for the pseudocode of the full training pipeline. See Alg. 2 for the online evaluation procedure.
Open Source Code No Project page: https://sites.google.com/view/ bati-icrl. The paper provides a project page URL, but it does not explicitly state that this page hosts the source code for the methodology described in the paper.
Open Datasets Yes We choose five representative robot locomotion environments based on the Mu Jo Co simulator (Todorov et al., 2012) with varying properties and levels of difficulty.
Dataset Splits Yes In each evaluation environment, we randomly sample 20 tasks as ptrain(M) and another 20 tasks as ptest(M) according to its task parameterization... We split the 40 evaluation tasks in Ant Dir into different training and testing splits and reran the experiments. As shown in Tab. 5, across all splits, BATI achieves the best performance uniformly and is highly stable.
Hardware Specification No The paper mentions using "Mu Jo Co simulator" for environments but does not specify any hardware details like GPU models, CPU types, or memory used for running the experiments.
Software Dependencies No We implement BATI and the baselines on the same codebase with IQL (Kostrikov et al., 2022) as the base offline RL algorithm. Building on the codebase of the official implementation of UNICORN (Li et al., 2024), we implement BATI and all our baselines. The paper mentions various software components and frameworks but does not provide specific version numbers for any of them (e.g., IQL, UNICORN, CSRO, BRAC).
Experiment Setup Yes Table 3. Hyperparameters used in each of our evaluation environments. Parameter Name: Learning Rate, Batch Size, Task Contrastive Batch Size, IQL τ, IQL β, IQL Exp. Adv. Clip, # Gradient Steps, Episode Length, Dataset Size, Task Latent Dim, BATI # Latent Samples N, UNICORN Weight, CSRO CLUB Weight, CLUB Encoder Hidden Dims, Encoder Hidden Dims, Decoder Hidden Dims, RL Hidden Dims.