reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Causality-Aware Contrastive Learning for Robust Multivariate Time-Series Anomaly Detection

Authors: Hyungi Kim, Jisoo Mok, Dongjun Lee, Jaihyun Lew, Sungjae Kim, Sungroh Yoon

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on five real-world and two synthetic datasets validate that the integration of causal relationships endows CAROTS with improved MTSAD capabilities. We validate the effectiveness of CAROTS across five real-world MTSAD datasets, where it consistently outperforms existing MTSAD methods. Lastly, the ablation study on each technical component of CAROTS verifies its individual contributions.
Researcher Affiliation	Collaboration	1Department of Electrical and Computer Engineering, Seoul National University, Seoul, Republic of Korea 2Interdisciplinary Program in Artificial Intelligence, Seoul National University, Seoul, Republic of Korea 3Hyundai Motor Company, Gyeonggi-do, Republic of Korea 4AIIS, ASRI, and INMC, Seoul National University, Seoul, Republic of Korea. Correspondence to: Sungroh Yoon <EMAIL>.
Pseudocode	No	The paper describes the methodology in detail across Section 3 and 4, illustrating the pipeline in Figure 1 and augmentors in Figure 2, but it does not contain a clearly labeled pseudocode or algorithm block.
Open Source Code	Yes	The code is available at https://github.com/kimanki/CAROTS.
Open Datasets	Yes	We demonstrate the effectiveness of CAROTS on five widely used real-world MTSAD datasets SWa T (Goh et al., 2016), WADI (Ahmed et al., 2017), PSM (Abdulaal et al., 2021), SMD (Su et al., 2019), and MSL (Hundman et al., 2018) and two synthetic datasets VAR and Lorenz96 (Karimi & Paul, 2010).
Dataset Splits	Yes	We use 20% of the training data as validation data and apply standard normalization to the entire dataset using the mean and standard deviation of training data. We construct training, validation, and test sets using a sliding window...For Lorenz96 and VAR, the window sizes were set to 2 and 4 following (Cheng et al., 2024a), respectively. In this experiment, synthetic datasets were generated with N = 128 and a total length of 40,000 time steps, which were split into 16,000, 4,000, and 20,000 steps for the train, validation, and test sets, respectively.
Hardware Specification	Yes	Training was performed on a single NVIDIA A40 GPU.
Software Dependencies	No	The paper mentions using 'Adam optimizer' (Kingma & Ba, 2015), 'gradient clipping' (Pascanu et al., 2013), and 'cosine learning rate scheduling' (Loshchilov & Hutter, 2017), but does not specify version numbers for any software libraries (e.g., PyTorch, TensorFlow) or the programming language used.
Experiment Setup	Yes	Unless specified otherwise, we train models with a window size of 10 and a batch size of 256 for 30 epochs. For contrastive learning-based models including CAROTS, we primarily employ an LSTM encoder (Hochreiter & Schmidhuber, 1997) following CTAD (Kim et al., 2023) and set the temperature parameter to 0.1. The threshold for similarity filtering is initialized to 0.5 and linearly increased to 0.9. Each model was optimized using Adam optimizer... The learning rate followed a cosine learning rate scheduling... starting at 0.0001 and increasing linearly over 5 epochs as a warm-up phase. A hyperparameter search was conducted over the learning rate in {0.001, 0.0003, 0.0001} and weight decay in {0.001, 0.0001, 0.0}.