reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Early Concept Drift Detection via Prediction Uncertainty

Authors: Pengqian Lu, Jie Lu, Anjin Liu, Guangquan Zhang

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical evaluations on both synthetic and real-world datasets demonstrate PUDD s efficacy in detecting drift in structured and image data.
Researcher Affiliation	Academia	Australian Artificial Intelligence Institute (AAII), University of Technology Sydney, Ultimo, NSW 2007, Australia {Pengqian.Lu@student., Jie.Lu@, Anjin.Liu@, Guangquan.Zhang@}uts.edu.au
Pseudocode	No	The pseudo-code and time complexity analysis is provided in the Appendix.
Open Source Code	Yes	Code https://github.com/Roc Stone/PUDD
Open Datasets	Yes	Our experiments utilize 3 real-world datasets (airline(Ikonomovska 2011), elec2(Harries 1999), powersupply(Dau et al. 2019)) and 4 synthetic sets (sine(Gama et al. 2004), mixed(Gama et al. 2004), CIFAR-10-CD, sea variants(Bifet et al. 2010)).
Dataset Splits	No	The paper describes partitioning data streams into chunks and using a sliding window strategy for drift detection. For example: "If the data is collected in chunks, then the stream includes a set of chunks D1,t = { Dj\|j [1, t]}, where each chunk Dj = {(xjk, yjk)\|k [1, M]} includes M examples." and "Dt1,t+1 is split into Dt1,r and Dr,t+1 for the Adaptive PU-index Bucketing algorithm." However, it does not provide specific train/test/validation splits with percentages or counts for the classifiers used in the experiments.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running its experiments.
Software Dependencies	No	The paper mentions evaluating methods using "three classifiers DNN (architecture detailed in the Appendix), Gaussian Naive Bayes (GNB) (Virtanen et al. 2020), and VFDT (Hulten, Spencer, and Domingos 2001)". While Virtanen et al. 2020 refers to Sci Py 1.0, this is a citation for the GNB implementation used, not an explicit statement of multiple software dependencies with version numbers for the authors' own method.
Experiment Setup	Yes	Our method is denoted as PUDD-X, where X represents the exponent in 10 X. ... The p-value obtained from the Pearson s Chi-square test serves as a precise control mechanism for our tolerance to false alarms. By adjusting the significance level (α), we can directly modulate the trade-off between sensitivity and false positive rate.