reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Explainable Data Decompositions

Authors: Sebastian Dalleiger, Jilles Vreeken3709-3716

AAAI 2020 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical evaluation on synthetic and real-world data shows that DISC efﬁciently discovers meaningful components and accurately characterises these in easily understandable terms.
Researcher Affiliation	Academia	Sebastian Dalleiger, Jilles Vreeken CISPA Helmholtz Center for Information Security EMAIL
Pseudocode	Yes	Algorithm 1: DESC for Describing the Composition and Algorithm 2: DISC for Discovering the Composition
Open Source Code	Yes	We provide the source code, datasets, synthetic dataset generator, and additional information needed for reproducibility.1 and 1https://eda.mmci.uni-saarland.de/disc/
Open Datasets	Yes	We provide the source code, datasets, synthetic dataset generator, and additional information needed for reproducibility.1 and 1https://eda.mmci.uni-saarland.de/disc/
Dataset Splits	No	The paper evaluates on synthetic and real-world datasets but does not explicitly provide details about train/validation/test splits (e.g., percentages, sample counts, or specific split methodologies) for reproduction.
Hardware Specification	Yes	We implemented DISC in C++ , ran experiments on a 12-Core Intel Xeon E5-2643 CPU, and report wall-clock time.
Software Dependencies	No	The paper states 'We implemented DISC in C++' but does not provide specific version numbers for key software components, libraries, or solvers.
Experiment Setup	Yes	In all experiments we have used the same signiﬁcance level α = 0.01. and Since DBSCAN relies on hyper-parameter, we optimize ℓ using a grid-search over 7 ϵ-candidates and we do not constraint cluster-sizes.