Behavioral Exploration: Learning to Explore via In-Context Adaptation
Authors: Andrew Wagenmaker, Zhiyuan Zhou, Sergey Levine
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the effectiveness of our method in both simulated locomotion and manipulation settings, as well as on real-world robotic manipulation tasks, illustrating its ability to learn adaptive, exploratory behavior. In our experimental evaluation, our focus is on understanding (a) whether BE is able to learn effective exploration strategies from offline demonstration data and adapt quickly online, (b) if BE is able to effectively focus its exploration over the space of behaviors present in the demonstration data, and (c) if BE scales to large-scale, real-world imitation learning (IL) settings. We first focus on RL benchmarks, where we compare against RL-based approaches to exploration, and then on IL, where we consider both simulated and real-world robotic tasks. |
| Researcher Affiliation | Academia | 1Department of Electrical Engineering & Computer Science, University of California, Berkeley. Correspondence to: Andrew Wagenmaker <EMAIL>. |
| Pseudocode | No | The paper describes mathematical propositions (Proposition 4.2, Proposition A.1) and outlines an objective function (Equation 4) for Behavioral Exploration. It describes methods in paragraph text and illustrates with diagrams (Figure 1), but does not present any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper states: "For all experiments, for both BE and BC, we use the diffusion policy architecture proposed by Dasari et al. (2024) and utilize their code base as the starting point for our method." This indicates they used third-party code, but there is no explicit statement or link provided for their own implementation of Behavioral Exploration. |
| Open Datasets | Yes | For our RL experiments, we evaluate BE on a subset of the environments in the D4RL benchmark (Fu et al., 2020), focusing in particular on settings that require exploration. In simulation, we utilize the Libero benchmark (Liu et al., 2024), which simulates a variety of robotic manipulation and pick-and-place tasks, while in the real world, we train a policy for object manipulation on the Bridge dataset (Walke et al., 2023). |
| Dataset Splits | Yes | For Antmaze, we evaluate on the medium and large variants of the maze using the diverse offline dataset, and for each test with four distinct goal locations... For Kitchen, we utilize the partial variant of the offline data. We run all experiments on the Libero 90 dataset, which includes 90 tasks spread across 21 distinct scenes. For each task, the dataset provides 50 human demonstrations of successful completion... A trial consists of 5 consecutive episodes in the same scene. |
| Hardware Specification | Yes | This research used the Savio computational cluster resource provided by the Berkeley Research Computing program at UC Berkeley. |
| Software Dependencies | No | The paper states: "For all experiments, for both BE and BC, we use the diffusion policy architecture proposed by Dasari et al. (2024) and utilize their code base as the starting point for our method." This mentions a specific architecture but does not provide specific version numbers for software components (e.g., Python, PyTorch, specific library versions). |
| Experiment Setup | Yes | Table 1: Common hyperparameters for all BE and BC experiments. Hyperparameter: Learning rate Value: 3e-4; LR scheduler: cosine; Warmup steps: 2000. Table 3: Hyperparameters for D4RL BE experiments. Table 4: Hyperparameters for D4RL BC experiments. Table 7: Hyperparameters for Libero BE and BC models. Table 9: Hyperparameters for Widow X BE and BC models. |