Generating Freeform Endoskeletal Robots
Authors: Muhan Li, Lingji Kong, Sam Kriegman
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper we introduced the computational design of freeform endoskeletal robots. ... We found that a universal controller could be simultaneously obtained by reinforcement learning during morphological evolution despite the wide range of endoskeletal geometries and topologies that emerged within the evolving population. We observed across several independent experimental trials that morphological evolution tended to push search into promising regions of latent space consisting of high performing body plans that facilitated policy training. For flat ground, this resulted in the evolution of simple two-jointed snakes (Fig. 6A), which in hindsight makes intuitive sense: such bodies are easier to control. But, in the other three task environments we tested, many other, more complex solutions evolved, including legged locomotion (Fig. 6B). |
| Researcher Affiliation | Academia | Muhan Li, Lingji Kong, Sam Kriegman Northwestern University |
| Pseudocode | Yes | Synthetic data generation pseudocode and hyperparameters can be found in Appx E. ... Algorithm 1 Procedural Generation of Synthetic Robot Body Plan |
| Open Source Code | Yes | Videos and code at https://endoskeletal.github.io. |
| Open Datasets | No | We began by procedurally generating synthetic training data examples of valid endoskeletal body plans in 3D voxel space using multi-star graphs (Fig. 10A)... Synthetic data generation pseudocode and hyperparameters can be found in Appx E. The resulting body was voxelized within a 64 64 64 cartesian grid... Because we can generate new synthetic data points on demand, our training data is unlimited, and the depth of our autoencoder and thus its potential for compress, generalize and capture complex hierarchical features is not constrained by lack of data. |
| Dataset Splits | No | A variational autoencoder (VAE; Kingma & Welling (2013)) with four blocks of Voxception-Res Net modules (Brock et al., 2016) was then trained to map voxel space into a highly compressed latent distribution consisting of 512 latent dimensions. Prior work that used voxel-based autoencoders considered fixed dataset, which limited their depth and required data augmentation to avoid overfitting (Brock et al., 2016). Because we can generate new synthetic data points on demand, our training data is unlimited... The population of designs fed to the universal controller was optimized by covariance matrix evolutionary strategies (CMA-ES; Hansen & Ostermeier (2001)). Briefly, a multivariate normal distribution of designs is sampled from the latent space and the mean vector is pulled toward the latent coordinates of sampled designs with the highest fitness... We employed Proximal Policy Optimization (PPO; Schulman et al. (2017)) to train a single universal controller for an evolving population of 64 endoskeletal robots. A clone of each design in the population was created, yielding a batch of 128 designs. |
| Hardware Specification | Yes | Training 64 independent policies for 30 epochs required 44.8 hours on 4 NVIDIA H100 SXM GPUs. |
| Software Dependencies | No | The paper discusses the use of a variational autoencoder (VAE) with Voxception-Res Net modules, Proximal Policy Optimization (PPO), and Graph Transformers. It also mentions covariance matrix evolutionary strategies (CMA-ES) and a multi-physics voxel-based simulator employing Euler Bernoulli beams and rigid body dynamics. However, it does not provide specific version numbers for any software libraries or frameworks used (e.g., PyTorch, TensorFlow, Python version). |
| Experiment Setup | Yes | Action space. The action space consists of joint angles, which control the movements and articulation of the robot s internal skeleton. We use a uniform discrete angle action space A = {-1.4 rad, -0.7 rad, 0 rad, 0.7 rad, 1.4 rad} for all joints. ... Reward was based on net displacement of the robot across 100 pairs of observations and actions sampled at 10Hz during a simulation episode of 10 seconds. ... Policy. We employed Proximal Policy Optimization (PPO; Schulman et al. (2017)) to train a single universal controller for an evolving population of 64 endoskeletal robots. ... Table 7: Simulation configuration hyperparameters. Table 8: Reinforcement learning and evolution hyperparameters. |