reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Feudal Graph Reinforcement Learning

Authors: Tommaso Marzi, Arshjot Singh Khehra, Andrea Cini, Cesare Alippi

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate the proposed framework on a graph clustering problem and Mu Jo Co locomotion tasks; simulation results show that FGRL compares favorably against relevant baselines. Furthermore, an in-depth analysis of the command propagation mechanism provides evidence that the introduced message-passing scheme favors learning hierarchical decision-making policies. In Fig. 3 we show the success rate of each agent in clustering the graph and the median of the Normalized Mutual Information (NMI) score computed across different runs. We report the results for the 4 agents in Fig. 5.
Researcher Affiliation	Academia	Tommaso Marzi EMAIL Università della Svizzera italiana, IDSIA Arshjot Khehra EMAIL Università della Svizzera italiana Andrea Cini EMAIL Università della Svizzera italiana, IDSIA Cesare Alippi EMAIL Università della Svizzera italiana, IDSIA Politecnico di Milano
Pseudocode	No	The paper only describes the methodology using text, equations (Eq. 2, 3, 4, 5), and a diagram (Fig. 2), without any explicit 'Pseudocode' or 'Algorithm' block.
Open Source Code	Yes	Code to reproduce experiments is available online at https://github.com/tommasomarzi/fgrl.
Open Datasets	Yes	We validate our framework on two scenarios, namely a synthetic graph clustering problem inspired by Bianchi et al. (2020) and continuous control environments from the standard Mu Jo Co locomotion tasks (Todorov et al., 2012), where we follow Huang et al. (2020).
Dataset Splits	No	The paper describes a synthetic graph clustering problem where graphs are generated with varying parameters (β, Nβ) and continuous control environments from Mu Jo Co locomotion tasks (Todorov et al., 2012) where agents interact with a simulator. It does not provide explicit training/test/validation dataset splits, as is common for simulated or procedurally generated environments.
Hardware Specification	Yes	Experiments were run on a workstation equipped with AMD EPYC 7513 CPUs.
Software Dependencies	No	The paper mentions that the code was developed relying on open-source libraries and publicly available code of previous works, and states the use of Adam optimizer and PPO, but does not provide specific version numbers for key software components or libraries (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	Table 3 provides detailed hyperparameters including Population size, Initial step size, Dimension of state representation, Dimension of hidden layer, Activation function, Aggregation functions, Maximal hierarchy height, and Message-passing rounds. Appendix D.4 further specifies PPO hyperparameters such as learning rate (3e-6), hidden layers ([64, 64] with tanh), discount factor (0.99), clipping value (0.2), policy update epochs (10), batch size (64), updating horizon (2048), and action standard deviation decay schedule.