reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Maximum Mean Discrepancy on Exponential Windows for Online Change Detection

Authors: Florian Kalinke, Marco Heyden, Georg Gntuni, Edouard Fouché, Klemens Böhm

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments on standard benchmark data streams show that MMDEW obtains the best F1-score on most data sets. Our experiments on standard benchmark data sets show that MMDEW performs better than state-of-the art on benchmark data streams.
Researcher Affiliation	Academia	Florian Kalinke EMAIL Karlsruhe Institute of Technology, Germany
Pseudocode	Yes	Algorithm 1: Proposed MMDEW change detection algorithm.
Open Source Code	Yes	Our code is available at https://github.com/FlopsKa/mmdew-change-detector.
Open Datasets	Yes	CIFAR10 (Krizhevsky et al., 2009) 60,000 1,024 9 Fashion MNIST (Xiao et al., 2017) 70,000 784 9 Gas (Vergara et al., 2012) 13,910 128 5 HAR (Anguita et al., 2013) 10,299 561 5 MNIST (Deng, 2012) 70,000 784 9
Dataset Splits	No	The paper does not provide explicit training/test/validation dataset splits. It describes how classification datasets are converted into streaming data for change detection: "For each data set, we first order the observations by their classes; a change occurs if the class changes. To introduce variation into the order of change points, we randomly permute the order of the classes before each run but use the same permutation across all algorithms."
Hardware Specification	Yes	We ran all experiments on a server running Ubuntu 20.04 with 124GB RAM, and 32 cores with 2GHz each.
Software Dependencies	No	The paper mentions "Ubuntu 20.04" as the operating system for the server but does not provide specific version numbers for any software libraries, frameworks, or programming languages used in the implementation.
Experiment Setup	Yes	We run a grid parameter optimization per data set and algorithm and report the best result w.r.t. the F1-score. We note that such an optimization is difficult to perform in practice here one typically prefers approaches with fewer or easy-to-set parameters but allows a fair comparison. Table 3 in Appendix C lists all the parameters we tested. For kernel-based algorithms (MMDEW, NEWMA, and Scan B-statistics) we use the Gaussian kernel k(x, y) = exp γ x y 2 (γ > 0) and set γ using the median heuristic (Garreau et al., 2018) on the first 100 observations. We also supply the first 100 observations to competitors requiring data to estimate further parameters (IBDD, WATCH) upfront.