Feudal Graph Reinforcement Learning

Authors: Tommaso Marzi, Arshjot Singh Khehra, Andrea Cini, Cesare Alippi

TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate the proposed framework on a graph clustering problem and Mu Jo Co locomotion tasks; simulation results show that FGRL compares favorably against relevant baselines. Furthermore, an in-depth analysis of the command propagation mechanism provides evidence that the introduced message-passing scheme favors learning hierarchical decision-making policies. In Fig. 3 we show the success rate of each agent in clustering the graph and the median of the Normalized Mutual Information (NMI) score computed across different runs. We report the results for the 4 agents in Fig. 5.
Researcher Affiliation Academia Tommaso Marzi EMAIL Università della Svizzera italiana, IDSIA Arshjot Khehra EMAIL Università della Svizzera italiana Andrea Cini EMAIL Università della Svizzera italiana, IDSIA Cesare Alippi EMAIL Università della Svizzera italiana, IDSIA Politecnico di Milano
Pseudocode No The paper only describes the methodology using text, equations (Eq. 2, 3, 4, 5), and a diagram (Fig. 2), without any explicit 'Pseudocode' or 'Algorithm' block.
Open Source Code Yes Code to reproduce experiments is available online at https://github.com/tommasomarzi/fgrl.
Open Datasets Yes We validate our framework on two scenarios, namely a synthetic graph clustering problem inspired by Bianchi et al. (2020) and continuous control environments from the standard Mu Jo Co locomotion tasks (Todorov et al., 2012), where we follow Huang et al. (2020).
Dataset Splits No The paper describes a synthetic graph clustering problem where graphs are generated with varying parameters (β, Nβ) and continuous control environments from Mu Jo Co locomotion tasks (Todorov et al., 2012) where agents interact with a simulator. It does not provide explicit training/test/validation dataset splits, as is common for simulated or procedurally generated environments.
Hardware Specification Yes Experiments were run on a workstation equipped with AMD EPYC 7513 CPUs.
Software Dependencies No The paper mentions that the code was developed relying on open-source libraries and publicly available code of previous works, and states the use of Adam optimizer and PPO, but does not provide specific version numbers for key software components or libraries (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes Table 3 provides detailed hyperparameters including Population size, Initial step size, Dimension of state representation, Dimension of hidden layer, Activation function, Aggregation functions, Maximal hierarchy height, and Message-passing rounds. Appendix D.4 further specifies PPO hyperparameters such as learning rate (3e-6), hidden layers ([64, 64] with tanh), discount factor (0.99), clipping value (0.2), policy update epochs (10), batch size (64), updating horizon (2048), and action standard deviation decay schedule.