reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Merlion: End-to-End Machine Learning for Time Series

Authors: Aadyot Bhatnagar, Paul Kassianik, Chenghao Liu, Tian Lan, Wenzhuo Yang, Rowan Cassius, Doyen Sahoo, Devansh Arpit, Sri Subramanian, Gerald Woo, Amrita Saha, Arun Kumar Jagota, Gokulakrishnan Gopalakrishnan, Manpreet Singh, K C Krithika, Sukumar Maddineni, Daeki Cho, Bo Zong, Yingbo Zhou, Caiming Xiong, Silvio Savarese, Steven Hoi, Huan Wang

JMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we use Merlion to benchmark the performance of various models on both forecasting (Table 1) and anomaly detection (Table 2). ... For each task, we ﬁrst train an initial model on the training split of a time series, and then re-train the model unsupervised either daily or hourly on the full data until that point ... Table 1 shows that Merlion s Auto ML module is eﬀective at improving the performance of multiple diﬀerent forecasting models; Table 2 shows that our proposed ensemble method (a unique oﬀering from Merlion) robustly achieves strong anomaly detection performance.
Researcher Affiliation	Industry	Aadyot Bhatnagar, Paul Kassianik, Chenghao Liu, Tian Lan, Wenzhuo Yang, Rowan Cassius , Doyen Sahoo, Devansh Arpit, Sri Subramanian , Gerald Woo, Amrita Saha, Arun Kumar Jagota , Gokulakrishnan Gopalakrishnan , Manpreet Singh , K C Krithika , Sukumar Maddineni , Daeki Cho , Bo Zong , Yingbo Zhou, Caiming Xiong, Silvio Savarese, Steven Hoi, Huan Wang . AI Research, Salesforce. Corresponding Authors: EMAIL . Monitoring Cloud, Salesforce . Warden AIOps, Salesforce . Service Protection, Salesforce
Pseudocode	No	The paper describes methods and components of the Merlion library but does not include any structured pseudocode or algorithm blocks.
Open Source Code	Yes	This work introduces Merlion1, a Python library for time series intelligence. 1. Code: https://github.com/salesforce/Merlion.
Open Datasets	Yes	m4-hour m4-day m4-week m4-month m4-quarter m4-year internal1 internal2 internal3 ... Table 1: Mean s MAPE achieved by univariate forecasting models on M4 (Makridakis et al., 2018) and 3 internal datasets. ... internal NAB AIOps UCR F1 ... Table 2: F1 scores achieved by univariate anomaly detection models. ... AIOps Challenge, 2018. URL http://iops.ai/competition_detail/?competition_id=5. ... H. A. Dau et al. The ucr time series classiﬁcation archive, October 2018. https://www.cs.ucr.edu/ ~eamonn/time_series_data_2018/. ... A. Lavin and S. Ahmad. Evaluating real-time anomaly detection algorithms the numenta anomaly benchmark. Co RR, abs/1510.03336, 2015. URL http://arxiv.org/abs/1510. 03336.
Dataset Splits	Yes	For each task, we ﬁrst train an initial model on the training split of a time series, and then re-train the model unsupervised either daily or hourly on the full data until that point (without adjusting the calibrator or threshold). We then incrementally obtain predictions for the full time series, in a way that simulates a live deployment scenario.
Hardware Specification	No	The paper discusses a distributed computing backend using Py Spark and Kubernetes for deployment but does not specify any hardware details (CPU, GPU models, memory) used for running experiments.
Software Dependencies	No	Merlion is an open-source machine learning library for time series... a Python library... a distributed back-end that uses py Spark (Zaharia et al., 2016)... The paper does not provide specific version numbers for software dependencies such as Python, PySpark, or other libraries.
Experiment Setup	No	To avoid the possible risk of label leaking through manual hyperparameter tuning, for all experiments, we evaluate all models with a single choice of sensible default hyperparameters and data pre-processing, regardless of dataset. ... The paper states that default hyperparameters were used but does not provide their specific values.