reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Belief Tracking for Planning with Sensing: Width, Complexity and Approximations

Authors: B. Bonet, H. Geffner

JAIR 2014 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The power of the last algorithm, beam tracking, will be shown empirically over large instances of problems such as Minesweeper, Battleship, and Wumpus, where state-of-the-art performance is obtained in real-time by combining the belief tracking algorithm with simple heuristics for action selection.1 ... We have tested beam tracking over large instances of Battleship, Minesweeper, and Wumpus, in combination with simple heuristics for action selection that make use of the computed beliefs.
Researcher Affiliation	Academia	Blai Bonet EMAIL Departamento de Computación Universidad Simón Bolívar Caracas, Venezuela Hector Geﬀner EMAIL ICREA & Universitat Pompeu Fabra Roc Boronat 138 08018 Barcelona, Spain
Pseudocode	No	The paper describes various belief tracking algorithms (factored belief tracking, causal belief tracking, beam tracking) and their mathematical formulations (e.g., equations 1, 2, 5, 6, 7). However, these algorithms are presented as textual descriptions and equations, not in structured pseudocode blocks or algorithm listings with numbered steps.
Open Source Code	Yes	A real-time animation of the algorithm for several instances of Minesweeper can be seen in https: //www.youtube.com/watch?v=U98ow4n87RA, while all the source code and graphical interfaces can be obtained at http://code.google.com/p/belief-tracking.
Open Datasets	No	The paper uses well-known game environments like Battleship, Minesweeper, and Wumpus. It describes how problem instances are generated (e.g., "k randomly-placed mines" for Minesweeper, "randomly placed" wumpus and pits for Wumpus). While these games are public concepts, the specific instances or datasets used for the experiments are generated by the authors or adapted from simulators, and no concrete access information (link, DOI, specific citation to a dataset repository) for these generated instances is provided.
Dataset Splits	No	The paper describes evaluating algorithms on multiple randomly generated instances rather than predefined dataset splits. For example, for Battleship, it states: "calculated over 10,000 random instances for each board". For Minesweeper and Wumpus, it mentions "Average results over 1,000 runs are shown" and "1,000 evaluations for diﬀerent initial conﬁgurations where the wumpus, pits and gold are randomly placed." This indicates repeated simulations on dynamically generated instances rather than standard train/test/validation splits.
Hardware Specification	Yes	The experiments were run on a Xeon Woodcrest 5140 CPU running at 2.33 GHz and with 8 GB of RAM.
Software Dependencies	No	The paper mentions using a "Java Tewnta framework" for graphical interfaces and adapting a "wumpuslite JAVA simulator". However, it does not provide specific version numbers for these tools or any other software libraries or programming languages used in the implementation of their methodology.
Experiment Setup	Yes	Table 1 shows results for conformant and contingent Ring problems obtained by combining factored belief tracking with simple heuristics. Each data point in panel (c) for the contingent problem is the average (and sample standard deviation) over 1,000 random instances. Times are in seconds... The algorithm was evaluated on diﬀerent instances with grids n n for n = 4, 6, 8, . . . , 20, each with a number of pits equal to (n 4)/2. For each grid size, we performed 1,000 evaluations for diﬀerent initial conﬁgurations where the wumpus, pits and gold are randomly placed. An instance of this game may turn unsolvable because the gold is isolated from the agent by pits, because the agent ﬁnds itself in a position where there is no safe movement, or because the agent exceeded the maximum number of actions (set to 3 times the number of cells in the grid).