Flow-based Domain Randomization for Learning and Sequencing Robotic Skills
Authors: Aidan Curtis, Eric Li, Michael Noseworthy, Nishad Gothoskar, Sachin Chitta, Hui Li, Leslie Pack Kaelbling, Nicole E Carey
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show that this architecture is more flexible and provides greater robustness than existing approaches that learn simpler, parameterized sampling distributions, as demonstrated in six simulated and one real-world robotics domain. |
| Researcher Affiliation | Collaboration | 1MIT CSAIL 2Autodesk Research. Correspondence to: Aidan Curtis <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Go Flow Algorithm 2 Belief-Space Planner Using BFS |
| Open Source Code | Yes | A.1. Code Release The codebase for the project can be found here. |
| Open Datasets | No | The paper uses simulated environments like Cartpole, Ant, Quadcopter, Quadruped, and Humanoid in the Isaac Lab suite of environments (Mittal et al., 2023) and a custom gear insertion task. While Isaac Lab is a known simulation environment, the paper does not explicitly state the use or provision of a publicly available dataset in the traditional sense, but rather generates data dynamically through domain randomization within these environments. No specific dataset is cited with access information. |
| Dataset Splits | No | The paper describes how evaluation is performed through policy rollouts on 4096 uniformly selected environment initialization samples, and training uses environments sampled from a distribution. However, it does not specify explicit training/test/validation dataset splits, as the nature of domain randomization involves dynamically generated environments rather than fixed datasets. |
| Hardware Specification | No | The paper mentions the use of a "Franka Emika robot" for real-world experiments. However, it does not provide specific details about the computing hardware (e.g., GPU models, CPU types, memory) used for training the reinforcement learning policies in simulation. |
| Software Dependencies | No | The paper mentions using the Proximal Policy Optimization (PPO) algorithm, the Zuko normalizing flow library (Rozet et al., 2022), the Isaac Lab suite of environments (Mittal et al., 2023), the Indust Real library built on frankapy toolkit (Tang et al., 2023a; Zhang et al., 2020), and the Bayes3D framework (Gothoskar et al., 2023). However, no specific version numbers are provided for any of these software components. |
| Experiment Setup | Yes | A.5. Hyperparameters We search over the following values of the α hyperparameter: [0.1, 0.5, 1.0, 1.5, 2.0]. We search over the following values of the β hyperparameters [0.0, 0.1, 0.5, 1.0, 2.0]. Other hyperparameters include number of network updates per training epoch (K = 100), network learning rate (ηϕ = 1e 3), and neural spline flow architecture hyperparameters such as network depth (ℓ= 3), hidden features (64), and number of bins (8). |