reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

A Disaster Response System based on Human-Agent Collectives

Authors: Sarvapali D. Ramchurn, Trung Dong Huynh, Feng Wu, Yukki Ikuno, Jack Flann, Luc Moreau, Joel E. Fischer, Wenchao Jiang, Tom Rodden, Edwin Simpson, Steven Reece, Stephen Roberts, Nicholas R. Jennings

JAIR 2016 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We individually validate each of these elements of HAC-ER and show how they perform against standard (non-HAC) baselines and also elaborate on the evaluation of the overall system. This process generated a number of new quantitative and qualitative results but also raised a number of new research questions. 4.3 Evaluation: Prediction with Sparse Reports. 6.3 Evaluation
Researcher Affiliation	Academia	Dept. of Electronics and Computer Science, University of Southampton, Southampton, UK. Mixed-Reality Lab Dept. of Computer Science, University of Nottingham, UK. Machine Learning Research Group, Department of Engineering Science, University of Oxford, UK. Imperial College London, UK. EMAIL
Pseudocode	Yes	5.1.1 THE MAX-SUM ALGORITHM. From agent i to task j: xi Di qi j(xi) = αi j + X k M(i) j rk i(xi) (1). From task j to agent i: xi Di rj i(xi) = max xj xi k N(j) i qk j(xk). j M(i) rj i(xi) = max x xi j=1 Uj(xj) (3)
Open Source Code	No	To help make these decisions, we developed interfaces for mixed-initiative task allocation (provided as a free tool at www.hacplanning.com). The paper mentions www.hacplanning.com as a 'free tool' but does not explicitly state that the source code for the methodology described in the paper is open-source or provide a link to a code repository.
Open Datasets	Yes	In our example we use the building damage assessment provided by UNOSAT for the Haiti 2010 earthquake (Corbane, Saito, Dell Oro, Bjorgo, E., Gill, S. P., Emmanuel Piard, B., Huyck, C. K., Kemper, T., Lemoine, G., Spence, R. J., et al., 2011) to learn the length-scale for predicting emergencies. along with structured data extracted from Open Street Map.
Dataset Splits	No	In the absence of ground truth data, we establish a gold-standard test set by training IBCC on 2723 reports, placed into 675 discrete locations on a 100 100 grid. Each location has on average approximately 4 reports per grid square. We then evaluate how effective our Bayesian heatmap method is at replicating these results with sparse subsets of the noisy reports. The experiment was repeated 20 times using different subsets of the complete Ushahidi dataset. The paper describes the creation of a 'gold-standard test set' and the use of 'sparse subsets' and 'different subsets' for evaluation, but does not provide specific percentages, absolute counts for train/test/validation, or a reproducible methodology for creating these splits.
Hardware Specification	No	No specific hardware details (such as CPU, GPU models, or cloud computing specifications) used for running the experiments or training the models are provided in the paper. The paper mentions UAVs, mobile phones, and tablets as components of the disaster response system, but it does not specify the hardware used to run the machine learning algorithms, simulations, or planning agents for the experimental evaluations.
Software Dependencies	No	No specific software dependencies with version numbers are provided in the paper. The paper discusses various algorithms and models (e.g., IBCC, Gaussian Process classification, max-sum, UCT, k-RMMDP, TPP) but does not specify the software libraries, frameworks, or programming language versions used for their implementation or experimentation.
Experiment Setup	Yes	This plot shows a peak around length-scale=18, indicating the value that is most supported by the building damages data. The size of the game area on the local university campus was 400 by 400 meters. There were two drop off zones and 20 targets in each trial. There were four targets for each of the four target types.