Relational Reinforcement Learning for Planning with Exogenous Effects

Authors: David Martínez, Guillem Alenyà, Tony Ribeiro, Katsumi Inoue, Carme Torras

JMLR 2017 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, experimental validation is provided that shows improvements over previous work in both simulation and a robotic task. The robotic task involves a dynamic scenario with several agents where a manipulator robot has to clear the tableware on a table.
Researcher Affiliation Academia David Martínez 1 EMAIL Guillem Alenyá 1 EMAIL Tony Ribeiro 2 EMAIL Katsumi Inoue 3 EMAIL Carme Torras 1 EMAIL 1 Institut de Robótica i Informática Industrial (CSIC-UPC), Barcelona, Spain 2 Laboratoire des sciences du numérique de Nantes (LS2N), Nantes, France 3 National Institute of Informatics, Tokyo, Japan
Pseudocode Yes Algorithm 1 Probabilistic LFIT(E, B) Algorithm 2 Operator Selection(Oinput, T) Algorithm 3 Operator Selection Subsumption(Tree O, T) Algorithm 4 V-MIN
Open Source Code No The paper does not provide a direct link to source code, an explicit statement of code release, or mention code in supplementary materials for the methodology described in this paper.
Open Datasets Yes Three IPPC 2014 domains were used in the experiments. Note that they were slightly modified to remove redundancy (e.g. a north(?X,?Y) literal is equivalent to south(?Y,?X), so one can be replaced by the other).
Dataset Splits No The paper describes how input transitions for model learning are generated randomly (e.g., "the state s is constructed by randomly assigning a value (positive or negative) to every literal"), and that the RL approach generates data through interaction (episodes and runs). It does not specify explicit training/test/validation splits for a fixed dataset in the traditional sense.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No PROST (Keller and Eyerich, 2012) is the planner used as it can obtain good results with probabilistic models containing exogenous effects.
Experiment Setup Yes The learner parameters used were α = 0.01, ϵ = 0.1, ω = 2, δ = 0.05, κ = 1000, and the subsumption was enabled. The V-MIN exploration threshold was ζ = 3 and Vmin was selected and updated by the teacher depending on the robot performance.