Generalized Behavior Learning from Diverse Demonstrations
Authors: Varshith Sreeramdass, Rohan Paleja, Letian Chen, Sanne van Waveren, Matthew Gombolay
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically validate across three continuous control benchmarks for generalizing to in-distribution (interpolation) and out-of-distribution (extrapolation) factors that GSD outperforms baselines in novel behavior discovery by 21%. Finally, we demonstrate that GSD can generalize striking behaviors for table tennis in a virtual testbed while leveraging human demonstrations collected in the real world. |
| Researcher Affiliation | Academia | Varshith Sreeramdass, Rohan Paleja, Letian Chen, Sanne van Waveren, Matthew Gombolay Georgia Institute of Technology EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1 Guided Strategy Discovery |
| Open Source Code | Yes | Code is available at github.com/CORE-Robotics-Lab/GSD. |
| Open Datasets | Yes | The Half Cheetah environment considered in Sec. 6 is from Open AI Gym (Brockman et al., 2016). The Fetch Pick Place environment considered is from the gym library (Brockman et al., 2016). The Drive Laneshift environment is built from the highway-env library (Leurent, 2018). |
| Dataset Splits | Yes | Splits: We divide the bounded 1D factor range into five consecutive equal-sized intervals: Interpolation: The first, third, and fifth intervals represent the train region, and the second and fourth are the test region. The split allows us to evaluate the ability to interpolate behaviors to two factor space intervals while providing three non-consecutive intervals to represent the factor. Extrapolation: The second and fourth intervals represent the train region, while the first and fifth intervals are the test region. We choose two non-consecutive intervals for the train region to have a sparse dataset while providing enough diversity to represent the factor. We use five demonstrations per interval (details in Appendix B). |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware (e.g., GPU models, CPU types, memory) used for running the experiments. While a 'Barrett WAM Arm' is mentioned for the Table Tennis setup, this refers to the robotic hardware for physical demonstrations/simulation, not the computing hardware for training models. |
| Software Dependencies | No | The paper mentions 'Py Torch (Imambi et al., 2021)' but does not specify a version number for PyTorch or any other key software libraries used in the implementation. |
| Experiment Setup | Yes | The hyperparameters used in our optimization are listed in Tables 1, 2. Each method is independently tuned for λI (and λC for Lipz, GSD) over the specified ranges, to maximize MAE over the test split for K=10 over averaged over four rounds of evaluation and five train seeds. All hyperparameters omitted from the tables are set to default values from our base implementation. |