Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]

Merlion: End-to-End Machine Learning for Time Series

Authors: Aadyot Bhatnagar, Paul Kassianik, Chenghao Liu, Tian Lan, Wenzhuo Yang, Rowan Cassius, Doyen Sahoo, Devansh Arpit, Sri Subramanian, Gerald Woo, Amrita Saha, Arun Kumar Jagota, Gokulakrishnan Gopalakrishnan, Manpreet Singh, K C Krithika, Sukumar Maddineni, Daeki Cho, Bo Zong, Yingbo Zhou, Caiming Xiong, Silvio Savarese, Steven Hoi, Huan Wang

JMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we use Merlion to benchmark the performance of various models on both forecasting (Table 1) and anomaly detection (Table 2). ... For each task, we first train an initial model on the training split of a time series, and then re-train the model unsupervised either daily or hourly on the full data until that point ... Table 1 shows that Merlion s Auto ML module is effective at improving the performance of multiple different forecasting models; Table 2 shows that our proposed ensemble method (a unique offering from Merlion) robustly achieves strong anomaly detection performance.
Researcher Affiliation Industry Aadyot Bhatnagar*, Paul Kassianik*, Chenghao Liu*, Tian Lan*, Wenzhuo Yang*, Rowan Cassius , Doyen Sahoo*, Devansh Arpit*, Sri Subramanian , Gerald Woo*, Amrita Saha*, Arun Kumar Jagota , Gokulakrishnan Gopalakrishnan , Manpreet Singh , K C Krithika , Sukumar Maddineni , Daeki Cho , Bo Zong , Yingbo Zhou*, Caiming Xiong*, Silvio Savarese*, Steven Hoi*, Huan Wang* . AI Research, Salesforce. Corresponding Authors: EMAIL . Monitoring Cloud, Salesforce . Warden AIOps, Salesforce . Service Protection, Salesforce
Pseudocode No The paper describes methods and components of the Merlion library but does not include any structured pseudocode or algorithm blocks.
Open Source Code Yes This work introduces Merlion1, a Python library for time series intelligence. 1. Code: https://github.com/salesforce/Merlion.
Open Datasets Yes m4-hour m4-day m4-week m4-month m4-quarter m4-year internal1 internal2 internal3 ... Table 1: Mean s MAPE achieved by univariate forecasting models on M4 (Makridakis et al., 2018) and 3 internal datasets. ... internal NAB AIOps UCR F1 ... Table 2: F1 scores achieved by univariate anomaly detection models. ... AIOps Challenge, 2018. URL http://iops.ai/competition_detail/?competition_id=5. ... H. A. Dau et al. The ucr time series classification archive, October 2018. https://www.cs.ucr.edu/ ~eamonn/time_series_data_2018/. ... A. Lavin and S. Ahmad. Evaluating real-time anomaly detection algorithms the numenta anomaly benchmark. Co RR, abs/1510.03336, 2015. URL http://arxiv.org/abs/1510. 03336.
Dataset Splits Yes For each task, we first train an initial model on the training split of a time series, and then re-train the model unsupervised either daily or hourly on the full data until that point (without adjusting the calibrator or threshold). We then incrementally obtain predictions for the full time series, in a way that simulates a live deployment scenario.
Hardware Specification No The paper discusses a distributed computing backend using Py Spark and Kubernetes for deployment but does not specify any hardware details (CPU, GPU models, memory) used for running experiments.
Software Dependencies No Merlion is an open-source machine learning library for time series... a Python library... a distributed back-end that uses py Spark (Zaharia et al., 2016)... The paper does not provide specific version numbers for software dependencies such as Python, PySpark, or other libraries.
Experiment Setup No To avoid the possible risk of label leaking through manual hyperparameter tuning, for all experiments, we evaluate all models with a single choice of sensible default hyperparameters and data pre-processing, regardless of dataset. ... The paper states that default hyperparameters were used but does not provide their specific values.