Using Simulation to Improve Sample-Efficiency of Bayesian Optimization for Bipedal Robots
Authors: Akshara Rai, Rika Antonova, Franziska Meier, Christopher G. Atkeson
JMLR 2019 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on the ATRIAS robot hardware and simulation show that our approach succeeds at sampleefficiently learning controllers for multiple robots. Another question arises: What if the simulation significantly differs from hardware? To answer this, we create increasingly approximate simulators and study the effect of increasing simulation-hardware mismatch on the performance of Bayesian optimization. We also compare our approach to other approaches from literature, and find it to be more reliable, especially in cases of high mismatch. Our experiments show that our approach succeeds across different controller types, bipedal robot models and simulator fidelity levels, making it applicable to a wide range of bipedal locomotion problems. |
| Researcher Affiliation | Academia | Akshara Rai EMAIL Robotics Institute, School of Computer Science Carnegie Mellon University, PA, USA. Rika Antonova EMAIL Robotics, Perception and Learning, CSC KTH Royal Institute of Technology, Stockholm, Sweden. Franziska Meier EMAIL Paul G. Allen School of Computer Science & Engineering University of Washington, Seattle, WA, USA. Christopher G. Atkeson EMAIL Robotics Institute, School of Computer Science Carnegie Mellon University, PA, USA. |
| Pseudocode | No | The paper describes methods and approaches in detail, such as the Determinants of Gait Transform and Neural Network based transform, but does not present them in pseudocode or a clearly labeled algorithm block. Steps are described within the regular flow of text. |
| Open Source Code | No | The paper does not contain an explicit statement offering access to the source code for the methodology described, nor does it provide a link to a code repository. It mentions using 'Our implementation of BO was based on the framework in Gardner et al. (2014)', but this refers to a third-party framework, not the authors' own implementation. |
| Open Datasets | No | The paper does not mention the use of any publicly available datasets in the traditional sense (e.g., ImageNet, CIFAR-10). Instead, it describes experiments conducted on the ATRIAS robot hardware and various simulators, where data is generated through these experiments (e.g., 'a Sobol grid of controller parameters (x1:N, N 0.5 million) along with trajectory summaries ξxi from simulation'). No specific links or citations to publicly available datasets used for the core experiments are provided. |
| Dataset Splits | No | The paper describes the generation of data through simulations (e.g., 'a Sobol grid of controller parameters (x1:N, N 0.5 million) along with trajectory summaries ξxi from simulation') and hardware experiments. It details how these generated data points are used for constructing priors ('To create cost prior for experiments in Section 5.3 we collected 50,000 evaluations of 30s trials for a range of controller parameters. Then we conducted 50 runs, using random subsets of 35,000 evaluations to construct the prior.'). However, it does not specify traditional training, validation, or test dataset splits in the context of evaluating a machine learning model on a fixed dataset. |
| Hardware Specification | No | The paper describes the physical ATRIAS robot (e.g., 'ATRIAS is a parallel bipedal robot, weighing 64kg') and mentions that a previous work by Cully et al. (2015) used 'a 16-core computer'. However, it does not provide specific details about the computational hardware (e.g., GPU/CPU models, memory, or cloud instance types) used by the authors to run their own simulations or Bayesian Optimization experiments. |
| Software Dependencies | No | The paper mentions several frameworks and tools, such as 'Our implementation of BO was based on the framework in Gardner et al. (2014)', 'We used Expected Improvement (EI) acquisition function (Mockus et al., 1978)', and 'sparse GP construction provided by Rasmussen and Nickisch (2010)'. However, it does not provide specific version numbers for any software libraries, programming languages, or solvers used to implement their methodology. |
| Experiment Setup | Yes | Hyper-parameters for BO were initialized to default values: 0 for mean offset, 1.0 for kernel length scales and signal variance, 0.1 for σn (noise parameter). Hyperparameters were optimized using the marginal likelihood (Shahriari et al. (2016), Section V-A). ... NN was trained using mean squared loss: NN input: x a set of controller parameters NN output: φtraj NN(x) = ˆξx reconstructed trajectory summary 2 PN i=1 ||ˆξxi ξxi||2. ... For experiments with 16D controller in Section 5.2, for example, the hidden layers contained 512, 128, 32 units; NN was trained on 100K simulated examples to reconstruct 8D trajectory summaries (see next-to-last row of Table 2). ... The cost function used in our experiments is a slight modification of the cost used in Song and Geyer (2015): ( 100 xfall, if fall ||vavg vtgt||, if walk (6). ... Each run typically consists of 10 experiments on the robot. All BO runs start from scratch, with an uninformed GP prior. At the end of the run, the GP posterior has 10 data points, depending on the experiment. |