Hardware Conditioned Policies for Multi-Robot Transfer Learning
Authors: Tao Chen, Adithyavairavan Murali, Abhinav Gupta
NeurIPS 2018 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our aim is to demonstrate the importance of conditioning the policy based on a hardware representation vh for transferring complicated policies between dissimilar robotic agents. We show performance gains on two diverse settings of manipulation and hopper. |
| Researcher Affiliation | Academia | Tao Chen The Robotics Institute Carnegie Mellon University Pittsburgh, PA 15213 EMAIL Adithyavairavan Murali The Robotics Institute Carnegie Mellon University Pittsburgh, PA 15213 EMAIL Abhinav Gupta The Robotics Institute Carnegie Mellon University Pittsburgh, PA 15213 EMAIL |
| Pseudocode | Yes | Algorithm 1 Hardware Conditioned Policies (HCP) |
| Open Source Code | No | The paper provides a link for videos of experiments but does not provide concrete access to the source code for the methodology described. |
| Open Datasets | No | The paper describes creating custom robot manipulators and varying their properties within the MuJoCo simulation environment, but it does not provide access information (link, DOI, or citation) for a publicly available dataset. |
| Dataset Splits | Yes | We performed several leave-one-out experiments (train on 8 robot types, leave 1 robot type untouched) on these robot types. |
| Hardware Specification | No | The paper mentions running experiments on a 'real Sawyer robot' but does not specify the computing hardware (e.g., CPU, GPU models, memory) used for training models or running simulations. |
| Software Dependencies | No | The paper mentions using MuJoCo as a physics engine and specific DRL algorithms (PPO, DDPG+HER) but does not provide specific version numbers for software libraries, programming languages, or other ancillary dependencies. |
| Experiment Setup | Yes | Rewards: We use binary sparse reward setting because sparse reward is more realistic in robotics applications. And we use DPPG+HER as the backbone training algorithm. The agent only gets +1 reward if POI is within ϵ euclidean distance of the desired goal position. Otherwise, it gets 1 reward. We use ϵ = 0.02m in all experiments. |