reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Solving Large Problems with Heuristic Search: General-Purpose Parallel External-Memory Search

Authors: Matthew Hatem, Ethan Burns, Wheeler Ruml

JAIR 2018 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluated the performance of A-HBDDD on the sliding-tile puzzle. We compared A-HBDDD with highly optimized implementations of internal A, IDA and Asynchronous Parallel IDA* (AIDA, Reinefeld & Schnecke, 1994). ... The plots in Figure 4 show a comparison between PEDAL, IDACR, AIDACRand BFHS-DDD on the square root version of the 100 tiles instances used by Korf (1985). ... Table 4 shows results for solving the 75 easiest instances of BAli BASE Reference Set 1 (the instances that A is able to solve) for comparison.
Researcher Affiliation	Academia	Matthew Hatem mhatem at cs.unh.edu Ethan Burns eaburns at cs.unh.edu Wheeler Ruml ruml at cs.unh.edu Department of Computer Science University of New Hampshire Durham, NH 03824 USA
Pseudocode	Yes	Figure 1: Pseudocode for A-HBDDD. ... Figure 2: Pseudocode for PEDAL. ... Figure 8: Pseudocode for the PE2A algorithm.
Open Source Code	Yes	We also provide a detailed implementation guide with source code both for generic parallel disk-based best-ﬁrst search and for Multiple Sequence Alignment with a biologically accurate cost function. ... Finally, we invite the reader to examine our source code (available at Hatem, Burns, & Ruml, 2014) ... Source code is available at Hatem et al. (2014).
Open Datasets	Yes	We used the 100 instances from Korf (1985) and the Manhattan distance heuristic. ... A popular benchmark for MSA algorithms is BAli BASE, a database of multiple sequence alignment problems speciﬁcally designed for the evaluation and comparison of multiple sequence alignment programs (Thompson, Plewniak, & Poch, 1999).
Dataset Splits	Yes	We used the 100 instances from Korf (1985) and the Manhattan distance heuristic. ... Of particular interest to our study is a set of instances known as Reference Set 1. ... Table 4 shows results for solving the 75 easiest instances of BAli BASE Reference Set 1 ... Table 5 shows results for solving the 7 most diﬃcult instances of BAli BASE Reference Set 1 using the scalable external memory algorithms, Parallel A-HBDDD and PE2A.
Hardware Specification	Yes	The results for A* were generated on Machine-A, a dual quad-core (8 cores) machine with Intel Xeon X5550 2.66 GHz processors and 48 GB RAM. All other results were generated on Machine-B, a dual hexa-core machine (12 cores) with Xeon X5660 2.80 GHz processors, 12 GB of RAM and 12 320 GB disks. ... We used Machine-A, a dual quad-core (8 cores) machine with Intel Xeon X5550 2.66 GHz processors and 48 GB RAM, to solve the hardest two instances (gal4 and 1pam A).
Software Dependencies	No	We wrote all algorithms in Java, however, many of the implementation details are applicable to other programming languages. ... The Java implementations use many of the same optimizations. In addition we use the High Performance Primitive Collection (HPPC) in place of the Java Collections Framework (JCF) for many of our data structures. ... For the disk-based algorithms we used Java s New I/O (NIO) for high performance I/O operations. ... We used Java s concurrency.utils package to manage multiple threads with instances of the Thread Pool Executor class. Explanation: The paper mentions specific software components like Java, HPPC, NIO, and concurrency.utils, but it does not provide specific version numbers for any of these, which is required by the instructions for a 'Yes' classification.
Experiment Setup	Yes	Our version of AIDA* used 24 threads and generated a frontier of 24,000 nodes, using an A* search, to seed the parallel phase of the search. ... For the algorithms using HBDDD, we selected a hash function that maps states to buckets by ignoring all except the position of the blank, one and two tiles. This hash function results in 3,360 buckets... For PEDAL... We found a value of 1/2 worked well in practice for the domains tested. ... We tried a range of values for C from 0 to 500 and found that 100 performed best. ... We found IDDP to be sensitive to the number of bins used to represent the distribution that is used to estimate the next bound; 500 bins performed well. ... We used the pair-wise (hall,2) heuristic to solve 80 of the 82 instances in the BAli BASE Reference Set 1 benchmark. ... For the 75 easiest instances, we used a hash function that used the lattice coordinate of the longest sequence. For all other instances, we used a hash function that used the lattice coordinates of the two longest sequences.