ACES: Automatic Cohort Extraction System for Event-Stream Datasets
Authors: Justin Xu, Jack Gallifant, ALISTAIR JOHNSON, Matthew McDermott
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To establish an overview of the computational profile of ACES, the collection of tasks from Table 1 was extracted on MIMIC-IV. ... Table 2: Performance statistics for various common predictive tasks on a single MEDS shard of MIMIC-IV. ... Table B.1: Quantitative comparison of ACES and other comparable cohort extraction tools across datasets and tasks. |
| Researcher Affiliation | Academia | Justin Xu University of Oxford EMAIL Jack Gallifant Massachusetts Institute of Technology EMAIL Alistair E. W. Johnson University of Toronto EMAIL Matthew B. A. Mc Dermott Harvard Medical School EMAIL |
| Pseudocode | No | The paper describes the 'ACES recursive algorithm' in Section 2.1 and illustrates its workflow in Figure 3. However, it presents a conceptual description and a flow diagram, not structured pseudocode or an algorithm block in a code-like format. |
| Open Source Code | Yes | ACES is available at: https://github.com/justin13601/aces. |
| Open Datasets | Yes | To establish an overview of the computational profile of ACES, the collection of tasks from Table 1 was extracted on MIMIC-IV. The MIMIC-IV MEDS schema has approximately 50,000 patients per shard... Additionally, for users dealing with large datasets, ACES can also be run over a collection of sharded files... Using the OMOP version of the MIMIC-IV Demo, as well as a synthetic dataset of 1,000 patients generated using Synthea (Walonoski et al., 2017) and converted into OMOP |
| Dataset Splits | No | The paper describes the extraction of patient cohorts and evaluates the system's computational performance. While it defines 'input' and 'target' windows for tasks, it does not specify how the *extracted cohorts* are split into training, validation, or test sets for downstream machine learning model development, which is typically required for reproducibility of experimental results. |
| Hardware Specification | Yes | All experiments were executed on a Linux server with 36 cores and 340 GBs of RAM available. All experiments were conducted on a default A100 GPU instance with 84 GB of RAM and 12 CPU cores from Google Cloud Platform s Compute Engine. |
| Software Dependencies | No | The paper mentions 'pip install es-aces' for installing the ACES library and refers to the 'Hydra framework (Yadan, 2019)' for CLI configurations. However, it does not provide specific version numbers for Python, Hydra, or any other critical software libraries or dependencies that would be needed to precisely replicate the experimental environment. |
| Experiment Setup | Yes | Define Task: A task configuration file is required to define the task that the user wishes to extract. This configuration language is simple, clear, yet flexible, permitting users to rapidly share and iterate over task definitions for their clinical settings. Configuration specification is given in Section 2.3. ... Figure 4: Example configuration file for the binary prediction of in-hospital mortality 48 hours after admission. References to predicates and windows are italicized and bolded. (This figure details specific parameters like 'trigger', 'target', 'start', 'end', 'label', 'has', and temporal constraints like 'trigger + 24 hours' or 'gap.end'). |