reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Learning Two-Step Hybrid Policy for Graph-Based Interpretable Reinforcement Learning

Authors: Tongzhou Mu, Kaixiang Lin, Feiyang Niu, Govind Thattai

TMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experimental studies on four levels of complex text-based games have demonstrated the superiority of the proposed method compared to the state-of-the-art. We evaluate our method on Text World, which is a framework for designing text-based interactive games. More specifically, we use the Text World games generated by GATA Adhikari et al. (2020). Table 3 shows the normalized scores of different methods on both training environments and test environment in Text World. Table 4 shows the performance of vanilla RL and our method under noisy input graphs generated in the above mentioned way. In this section, we study the contributions of different modules in our method.
Researcher Affiliation	Collaboration	Tongzhou Mu EMAIL Department of Computer Science and Engineering University of California San Diego Kaixiang Lin EMAIL Amazon Feiyang Niu EMAIL Amazon Govind Thattai EMAIL Amazon
Pseudocode	No	The paper describes the two-step hybrid decision-making process and the rule mining process in detail using natural language and mathematical formulations, but it does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain any explicit statement about releasing source code, nor does it provide a link to a code repository.
Open Datasets	Yes	We evaluate our method on Text World, which is a framework for designing text-based interactive games. More specifically, we use the Text World games generated by GATA Adhikari et al. (2020).
Dataset Splits	Yes	The games have four different difficulty levels, and each difficulty level contains 20 training, 20 validation, and 20 test environments, which are sampled from a distribution based on the difficulty level.
Hardware Specification	No	The paper mentions training models and experiments but does not provide any specific details about the hardware used (e.g., GPU models, CPU types, or cloud computing instance specifications).
Software Dependencies	No	The paper mentions several software components and frameworks used, such as "fastText Mikolov et al. (2017)", "Relational-GCN", "DQN Mnih et al. (2015)", "GCN", and "GTN Yun et al. (2019)". However, it does not specify version numbers for these or other ancillary software components.
Experiment Setup	Yes	To collect demonstration dataset, we first train a teacher policy by DQN Mnih et al. (2015) in the training environments, which can converge to a near-optimal solution. The trained teacher policy is used to collect 300K samples through the interaction with the environment, and label them with the taken actions, as illustrated in Sec 4.3.1. When collecting the demonstration dataset, we use ϵ-greedy exploration strategy to increase the diversity of states. We want to train a classifier fp(s; θ) = k, where k {1, 2, ..., K} is an action type. This is a conventional classification problem which can be solved by minimizing cross entropy loss: θ = arg min θ X j=1 kj i log(f j θ(si)). Then we can get the ASE(Ak) by selecting the edges with the importance higher than a threshold, i.e., ASE(Ak) = {e\|Ia(e) > τ}, where τ is a hyperparameter shared across all action types.