Relational Reinforcement Learning for Planning with Exogenous Effects
Authors: David Martínez, Guillem Alenyà, Tony Ribeiro, Katsumi Inoue, Carme Torras
JMLR 2017 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, experimental validation is provided that shows improvements over previous work in both simulation and a robotic task. The robotic task involves a dynamic scenario with several agents where a manipulator robot has to clear the tableware on a table. |
| Researcher Affiliation | Academia | David Martínez 1 EMAIL Guillem Alenyá 1 EMAIL Tony Ribeiro 2 EMAIL Katsumi Inoue 3 EMAIL Carme Torras 1 EMAIL 1 Institut de Robótica i Informática Industrial (CSIC-UPC), Barcelona, Spain 2 Laboratoire des sciences du numérique de Nantes (LS2N), Nantes, France 3 National Institute of Informatics, Tokyo, Japan |
| Pseudocode | Yes | Algorithm 1 Probabilistic LFIT(E, B) Algorithm 2 Operator Selection(Oinput, T) Algorithm 3 Operator Selection Subsumption(Tree O, T) Algorithm 4 V-MIN |
| Open Source Code | No | The paper does not provide a direct link to source code, an explicit statement of code release, or mention code in supplementary materials for the methodology described in this paper. |
| Open Datasets | Yes | Three IPPC 2014 domains were used in the experiments. Note that they were slightly modified to remove redundancy (e.g. a north(?X,?Y) literal is equivalent to south(?Y,?X), so one can be replaced by the other). |
| Dataset Splits | No | The paper describes how input transitions for model learning are generated randomly (e.g., "the state s is constructed by randomly assigning a value (positive or negative) to every literal"), and that the RL approach generates data through interaction (episodes and runs). It does not specify explicit training/test/validation splits for a fixed dataset in the traditional sense. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | PROST (Keller and Eyerich, 2012) is the planner used as it can obtain good results with probabilistic models containing exogenous effects. |
| Experiment Setup | Yes | The learner parameters used were α = 0.01, ϵ = 0.1, ω = 2, δ = 0.05, κ = 1000, and the subsumption was enabled. The V-MIN exploration threshold was ζ = 3 and Vmin was selected and updated by the teacher depending on the robot performance. |