reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Planning with Perspectives -- Decomposing Epistemic Planning using Functional STRIPS

Authors: Guang Hu, Tim Miller, Nir Lipovetzky

JAIR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We ran evaluations on well-known epistemic planning benchmarks to compare an existing state-of-the-art planner, and on new scenarios that demonstrate the expressiveness of the PWP approach. The results show that our PWP planner scales signiﬁcantly better than the state-of-the-art planner that we compared against, and can express problems more succinctly.
Researcher Affiliation	Academia	Guang Hu EMAIL Tim Miller EMAIL Nir Lipovetzky EMAIL School of Computing and Information Systems The University of Melbourne, Australia
Pseudocode	No	The paper provides concrete code examples in Appendix 7.1 (e.g., C++ functions for 'sees' in different domains), but it does not include structured pseudocode or algorithm blocks in a language-agnostic format within the main text.
Open Source Code	Yes	The source code of our implementation along with all experiments can be found at https://github.com/guanghuhappysf128/benchmarks.
Open Datasets	Yes	We ran evaluations on well-known epistemic planning benchmarks to compare an existing state-of-the-art planner, and on new scenarios that demonstrate the expressiveness of the PWP approach. We implement three benchmarks problems, Corridor (Kominis & Geﬀner, 2015), Gossip (Cooper et al., 2016), and Grapevine (Muise et al., 2015a) to compare the computational performance, and model two new domains, Big Brother Logic and Social-media Network, to examine the expressiveness of our model. ... our experiments, we compare these three methods state-based, action-based (full) and action-based (relative) with the baseline of Cooper et al. (2019) s method using their tool for generating their Gossip Generator8. Their generator compiles Gossip problems into classical planning problems... 8. Downloadable from https://github.com/Faustine Maffre/Gossip Problem-PDDL-generator
Dataset Splits	No	The paper describes using various planning benchmarks (Corridor, Grapevine, Gossip) and custom domains (BBL, Social-media Network) for evaluation. It varies parameters like the number of agents, depth of queries, and number of goals to test scalability. However, it does not mention specific training, validation, or test dataset splits in the context of typical machine learning experimentation.
Hardware Specification	Yes	We ran the problems with both planners on a Linux machine with 8 CPUs (Intel Core i7-7700K CPU @ 4.20GHz 8) and 16 gigabytes of memory.
Software Dependencies	No	We use F-STRIPS as the modelling language, and BFWS(R0) (Franc es et al., 2017) as the planner. ... In our implementation, external functions are programmed in C++ for scalability and ﬂexibility. ... in this domain, there was not a signiﬁcant performance diﬀerence with respect to the original planner used by Muise et al. (2015a), the FF planner. ... The planner is available through https://github.com/aig-upf/2017-planning-with-simulators. We used options --driver sbfws --options bfws.rs=none
Experiment Setup	No	The paper describes the general approach to encoding problems in F-STRIPS and the use of external functions. It provides goal conditions and example plans for various scenarios. However, it does not explicitly provide hyperparameter values, specific optimizer settings, learning rates, or other detailed system-level training configurations typically found in experimental setups for machine learning or complex simulations.