reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

ACES: Automatic Cohort Extraction System for Event-Stream Datasets

Authors: Justin Xu, Jack Gallifant, ALISTAIR JOHNSON, Matthew McDermott

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To establish an overview of the computational profile of ACES, the collection of tasks from Table 1 was extracted on MIMIC-IV. ... Table 2: Performance statistics for various common predictive tasks on a single MEDS shard of MIMIC-IV. ... Table B.1: Quantitative comparison of ACES and other comparable cohort extraction tools across datasets and tasks.
Researcher Affiliation	Academia	Justin Xu University of Oxford EMAIL Jack Gallifant Massachusetts Institute of Technology EMAIL Alistair E. W. Johnson University of Toronto EMAIL Matthew B. A. Mc Dermott Harvard Medical School EMAIL
Pseudocode	No	The paper describes the 'ACES recursive algorithm' in Section 2.1 and illustrates its workflow in Figure 3. However, it presents a conceptual description and a flow diagram, not structured pseudocode or an algorithm block in a code-like format.
Open Source Code	Yes	ACES is available at: https://github.com/justin13601/aces.
Open Datasets	Yes	To establish an overview of the computational profile of ACES, the collection of tasks from Table 1 was extracted on MIMIC-IV. The MIMIC-IV MEDS schema has approximately 50,000 patients per shard... Additionally, for users dealing with large datasets, ACES can also be run over a collection of sharded files... Using the OMOP version of the MIMIC-IV Demo, as well as a synthetic dataset of 1,000 patients generated using Synthea (Walonoski et al., 2017) and converted into OMOP
Dataset Splits	No	The paper describes the extraction of patient cohorts and evaluates the system's computational performance. While it defines 'input' and 'target' windows for tasks, it does not specify how the extracted cohorts are split into training, validation, or test sets for downstream machine learning model development, which is typically required for reproducibility of experimental results.
Hardware Specification	Yes	All experiments were executed on a Linux server with 36 cores and 340 GBs of RAM available. All experiments were conducted on a default A100 GPU instance with 84 GB of RAM and 12 CPU cores from Google Cloud Platform s Compute Engine.
Software Dependencies	No	The paper mentions 'pip install es-aces' for installing the ACES library and refers to the 'Hydra framework (Yadan, 2019)' for CLI configurations. However, it does not provide specific version numbers for Python, Hydra, or any other critical software libraries or dependencies that would be needed to precisely replicate the experimental environment.
Experiment Setup	Yes	Define Task: A task configuration file is required to define the task that the user wishes to extract. This configuration language is simple, clear, yet flexible, permitting users to rapidly share and iterate over task definitions for their clinical settings. Configuration specification is given in Section 2.3. ... Figure 4: Example configuration file for the binary prediction of in-hospital mortality 48 hours after admission. References to predicates and windows are italicized and bolded. (This figure details specific parameters like 'trigger', 'target', 'start', 'end', 'label', 'has', and temporal constraints like 'trigger + 24 hours' or 'gap.end').