Perturbation Training for Human-Robot Teams

Authors: Ramya Ramakrishnan, Chongjie Zhang, Julie Shah

JAIR 2017 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this work, we design and evaluate a computational learning model that enables a human-robot team to co-develop joint strategies for performing novel tasks that require coordination. ... We empirically validate the benefits of Ada PT through comparison to other hybrid reinforcement and transfer learning techniques ... Results from large-scale human subject experiments (n=48) indicate that Ada PT enables an agent to learn in a manner compatible with a human s own learning process, and that a robot undergoing perturbation training with a human results in a high level of team performance. Finally, we demonstrate that humanrobot training using Ada PT in a simulation environment produces effective performance for a team incorporating an embodied robot partner.
Researcher Affiliation Academia Ramya Ramakrishnan EMAIL Massachusetts Institute of Technology 77 Massachusetts Ave, Cambridge, MA 02139 Chongjie Zhang EMAIL Tsinghua University 30 Shuangqing Rd, Haidian Qu, Beijing Shi, China Julie Shah EMAIL Massachusetts Institute of Technology 77 Massachusetts Ave, Cambridge, MA 02139
Pseudocode Yes Figure 3: The Adaptive Perturbation Training (Ada PT) algorithm takes as input a new task and a library of value functions learned from previous tasks, among other parameters. ... Algorithm: Ada PT (Ω, {Q1, ..., QN}, τ, τ, K, H, γ, α, ϵ) ... Figure 4: π-reuse is a subfunction of PRQL that executes one episode of the task (from initialization to goal state) by using the chosen past policy to guide the learning of a new value function. ... Algorithm: π-reuse (QΩ, Πpast, ψ, v, H, γ, α, ϵ) ... Figure 5: Update-QValues is a subfunction of Ada PT that executes one episode of the task (from initialization to goal state) and updates all Q-value functions to adapt to the new task. ... Algorithm: Update-QValues (Q1, ..., QN, c, H, γ, α, ϵ) ... Figure 9: The Human-Robot-Communicate algorithm provides a computational framework for the robot to make decisions when communicating with a person. ... Algorithm: Human-Robot-Communicate (< a h, a r >, ϵsugg, ϵacc, Q(s, ah, ar), Q(s, ar))
Open Source Code No The paper does not provide any explicit statement about releasing source code, a direct link to a code repository, or mention of code in supplementary materials for the methodology described.
Open Datasets No The paper describes custom-designed simulation environments for a "Fire Extinguishing Task" and a "Grid World Task" where state and action spaces are defined within the paper. There are no mentions of standard public datasets, direct links, DOIs, or citations to specific public dataset resources used for the experiments.
Dataset Splits No The paper describes experimental phases with different "task variants" for training and testing (e.g., "Perturbation teams trained together on three different task variants, two times each. ... The teams were then tested on three new variants of the task"), but it does not refer to static datasets with specific training/test/validation splits in terms of percentages, sample counts, or citations to predefined data partitions. The data is generated through interaction in a simulated environment.
Hardware Specification Yes Finally, we demonstrate in robot experiments with a PR2 (n=12 human-robot teams) that human-robot training using Ada PT in a simulation environment produces effective performance for a team incorporating an embodied robot partner. ... We recruited 12 participants from a university campus and assigned them all to the perturbation Ada PT condition. ... For the embodied robot experiments, the participants trained in simulation and then worked with a PR2 robot during three testing sessions, which were identical to those in the simulation experiments.
Software Dependencies No The paper mentions several algorithms and frameworks like Q-learning, PRQL, RBDist, and Google web speech recognition, but it does not specify any version numbers for these software components or other libraries used in the implementation.
Experiment Setup Yes In our experiments, we used γ = 1 to represent a finite horizon problem. Other parameters were initialized as follows: ϵ = 0.1, α = 0.05, τ = 0, τ = 0.01, K = 400,000 for training and 2,000 for testing, and H = 30. For PRQL, the additional ψ parameter was initially set to 1 and the v parameter to 0.95. For PRQL-RBDist, we used 5,000 data points of the form < s, a, s > for each task. ... Parameters for the Grid World task were initialized as follows: γ = 1, ϵ = 0.1, and α = 0.05, τ = 0, τ = 0.01, K = 1,000,000 for training and 5,000 for testing, and H = 30. For PRQL, the additional ψ parameter was initialized to 1 and the v parameter to 0.95. For PRQL-RBDist, we used 10,000 data points of the form < s, a, s > for each task.