reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Robust Root Cause Diagnosis using In-Distribution Interventions

Authors: Lokesh Nagalapatti, Ashutosh Srivastava, Sunita Sarawagi, Amit Sharma

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on both synthetic and Pet Shop RCD benchmark datasets demonstrate that IDI consistently identifies true root causes more accurately and robustly than nine existing state-of-the-art RCD baselines. We then conduct experiments by systematically varying the SCM's complexity to demonstrate the cases where IDI's interventional approach outperforms the counterfactual approach and vice versa. Experiments on both synthetic and Pet Shop RCD benchmark datasets demonstrate that IDI consistently identifies true root causes more accurately and robustly than nine existing state-of-the-art RCD baselines.
Researcher Affiliation	Collaboration	1Indian Institute of Technology Bombay 2International Institute of Information Technology Hyderabad 3Microsoft Research India
Pseudocode	Yes	The pseudocode for the multi-root-cause diagnosis algorithm is presented in Alg 1 in the Appendix.
Open Source Code	No	Code will be released at https://github.com/nlokeshiisc/IDI_release.
Open Datasets	Yes	Experiments on both synthetic and Pet Shop RCD benchmark datasets demonstrate that IDI consistently identifies true root causes more accurately and robustly than nine existing state-of-the-art RCD baselines. Pet Shop (Hardt et al., 2024) is a recent dataset designed for benchmarking RCD methods in the cloud domain, featuring a call graph G that causally links key performance indicators (KPIs).
Dataset Splits	Yes	We generate n {25, 50, 100, 1000} training samples, along with 100 validation and 100 test samples, each with a unique root cause.
Hardware Specification	No	No specific hardware details (GPU/CPU models, processor types, memory amounts, or detailed computer specifications) were explicitly mentioned for running the experiments. The paper generally refers to 'cloud services' without further specifications.
Software Dependencies	No	We implemented IDI in the RCD library released by Pet Shop (Hardt et al., 2024)1. Pet Shop uses Dowhy (Sharma & Kiciman, 2020) and gcm (Bl obaum et al., 2022) for causal inference and Py RCA (Liu et al., 2023) for root cause analysis.
Experiment Setup	Yes	We sample the linear weights w1, w2, w3 from N(0, 1) and define the non-linear model as X4 = \|X2\|+exp( X3) 3 + ϵ4. We draw the exogenous variables ϵ1, ϵ2, ϵ3 from N(0, 1) and define the structural equations for the root nodes as Xi = fi(ϵi) = ϵi for i {1, 2, 3}. ... We fit the linear model using closed-form regression and train the non-linear model as a three-layer MLP with 10 hidden nodes and Re LU activations via gradient descent. For the toy experiment, we sample xfix j from its true distribution N(0, 1). ...To assess anomalies, we use the Z-Score, defined for Xi as Z-score(xi) = \|xi µi\| σi where µi and σi are the sample mean and standard deviation computed for the ith node in the training data. ...Exogenous variables follow a uniform distribution ϵi U[0, 1] making their standard deviation std(ϵi) = 0.3. ... We start our search from 0 and increase them in steps of size 0.25 until the Z-score of the target node ϕn(xn) hits the anomaly threshold 3.