reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Planning with Critical Section Macros: Theory and Practice

Authors: Lukas Chrpa, Mauro Vallati

JAIR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We provide an extensive and detailed empirical evaluation on a broad range of domains. The experimental analysis is presented in Section 7.
Researcher Affiliation	Academia	Lukáš Chrpa EMAIL Faculty Of Electrical Engineering, Czech Technical University in Prague, Jugosl avsk ych partyz an u 1580/3, Prague, 160 00, Czechia Mauro Vallati EMAIL School of Computing and Engineering, University of Huddersﬁeld, Queensgate, Huddersﬁeld, HD1 3DH, United Kingdom
Pseudocode	Yes	Algorithm 1 Assembling Macro-actions from a sequence of actions ... Algorithm 2 Learning (Sp Sl) CSMs from training plans ... Algorithm 3 A high-level routine for learning Sp Sl CSMs ... Algorithm 4 A high-level routine for learning compound CSMs
Open Source Code	Yes	Our code and benchmarks can be found at: https://github.com/lchrpa/CSMs.
Open Datasets	Yes	We considered a range of well-known benchmark domains from both deterministic and learning tracks of IPCs. In particular: Elevators, Floortile, GED, Hiking, Termes, and Transport from the deterministic track of IPCs 2011, 2014 and 2018, and Barman, Blocksworld (Bw), Depots, Gold Miner (Gold), Gripper, Matching-Bw, Rovers, Sokoban, and Thoughtful from the learning track of IPCs 2008 and 2011. We have also considered the Storage domain from IPC 2006, that was used for evaluating the Blo Ma technique (Chrpa & Siddiqui, 2015). ... Since 1998, the International Planning Competition (IPC)1 has been organised ... 1. http://ipc.icaps-conference.org
Dataset Splits	Yes	As testing instances, for each domain we used those exploited in IPCs. There are 20 instances for the domains included in the deterministic tracks (except Storage), and 30 instances for the learning track benchmarks and Storage. ... we considered 6 training tasks per each domain such that their plan length was mostly within 30-80 actions4. One training plan was considered per training task.
Hardware Specification	Yes	All the experiments were conducted on Intel Xeon E5-2620 v4 2.10 GHz with 32GB RAM.
Software Dependencies	No	The paper lists specific planning engines by name (e.g., FF, LAMA, Probe, Mp C, Mercury, Yahsp3, FDSS 2018, Dual BFWS) and their corresponding citation, but does not provide specific version numbers for these or any other software libraries or dependencies. For example, it mentions 'FDSS 2018' but not its full version or other libraries with versions.
Experiment Setup	Yes	We considered 6 training tasks per each domain such that their plan length was mostly within 30-80 actions4. One training plan was considered per training task. For each individual domain, out of all considered planners, a planner which generates best quality training plans... is selected to generate training plans for that domain. ... The thresholds for underrepresented macros, ν1 and ν2 (see Algorithms 2,3 and 4) were set according to results of preliminary experiments. In particular, ν1 was set to maximum of 1/2 of the number of the training tasks and 1/3 occurrences of the most frequent macro, while ν2 was set to the number of training tasks. ... For each testing task a time limit of 900 seconds and a memory limit of 4 GB is applied