Generating Freeform Endoskeletal Robots

Authors: Muhan Li, Lingji Kong, Sam Kriegman

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this paper we introduced the computational design of freeform endoskeletal robots. ... We found that a universal controller could be simultaneously obtained by reinforcement learning during morphological evolution despite the wide range of endoskeletal geometries and topologies that emerged within the evolving population. We observed across several independent experimental trials that morphological evolution tended to push search into promising regions of latent space consisting of high performing body plans that facilitated policy training. For flat ground, this resulted in the evolution of simple two-jointed snakes (Fig. 6A), which in hindsight makes intuitive sense: such bodies are easier to control. But, in the other three task environments we tested, many other, more complex solutions evolved, including legged locomotion (Fig. 6B).
Researcher Affiliation Academia Muhan Li, Lingji Kong, Sam Kriegman Northwestern University
Pseudocode Yes Synthetic data generation pseudocode and hyperparameters can be found in Appx E. ... Algorithm 1 Procedural Generation of Synthetic Robot Body Plan
Open Source Code Yes Videos and code at https://endoskeletal.github.io.
Open Datasets No We began by procedurally generating synthetic training data examples of valid endoskeletal body plans in 3D voxel space using multi-star graphs (Fig. 10A)... Synthetic data generation pseudocode and hyperparameters can be found in Appx E. The resulting body was voxelized within a 64 64 64 cartesian grid... Because we can generate new synthetic data points on demand, our training data is unlimited, and the depth of our autoencoder and thus its potential for compress, generalize and capture complex hierarchical features is not constrained by lack of data.
Dataset Splits No A variational autoencoder (VAE; Kingma & Welling (2013)) with four blocks of Voxception-Res Net modules (Brock et al., 2016) was then trained to map voxel space into a highly compressed latent distribution consisting of 512 latent dimensions. Prior work that used voxel-based autoencoders considered fixed dataset, which limited their depth and required data augmentation to avoid overfitting (Brock et al., 2016). Because we can generate new synthetic data points on demand, our training data is unlimited... The population of designs fed to the universal controller was optimized by covariance matrix evolutionary strategies (CMA-ES; Hansen & Ostermeier (2001)). Briefly, a multivariate normal distribution of designs is sampled from the latent space and the mean vector is pulled toward the latent coordinates of sampled designs with the highest fitness... We employed Proximal Policy Optimization (PPO; Schulman et al. (2017)) to train a single universal controller for an evolving population of 64 endoskeletal robots. A clone of each design in the population was created, yielding a batch of 128 designs.
Hardware Specification Yes Training 64 independent policies for 30 epochs required 44.8 hours on 4 NVIDIA H100 SXM GPUs.
Software Dependencies No The paper discusses the use of a variational autoencoder (VAE) with Voxception-Res Net modules, Proximal Policy Optimization (PPO), and Graph Transformers. It also mentions covariance matrix evolutionary strategies (CMA-ES) and a multi-physics voxel-based simulator employing Euler Bernoulli beams and rigid body dynamics. However, it does not provide specific version numbers for any software libraries or frameworks used (e.g., PyTorch, TensorFlow, Python version).
Experiment Setup Yes Action space. The action space consists of joint angles, which control the movements and articulation of the robot s internal skeleton. We use a uniform discrete angle action space A = {-1.4 rad, -0.7 rad, 0 rad, 0.7 rad, 1.4 rad} for all joints. ... Reward was based on net displacement of the robot across 100 pairs of observations and actions sampled at 10Hz during a simulation episode of 10 seconds. ... Policy. We employed Proximal Policy Optimization (PPO; Schulman et al. (2017)) to train a single universal controller for an evolving population of 64 endoskeletal robots. ... Table 7: Simulation configuration hyperparameters. Table 8: Reinforcement learning and evolution hyperparameters.