Explainable Data Decompositions

Authors: Sebastian Dalleiger, Jilles Vreeken3709-3716

AAAI 2020 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical evaluation on synthetic and real-world data shows that DISC efficiently discovers meaningful components and accurately characterises these in easily understandable terms.
Researcher Affiliation Academia Sebastian Dalleiger, Jilles Vreeken CISPA Helmholtz Center for Information Security EMAIL
Pseudocode Yes Algorithm 1: DESC for Describing the Composition and Algorithm 2: DISC for Discovering the Composition
Open Source Code Yes We provide the source code, datasets, synthetic dataset generator, and additional information needed for reproducibility.1 and 1https://eda.mmci.uni-saarland.de/disc/
Open Datasets Yes We provide the source code, datasets, synthetic dataset generator, and additional information needed for reproducibility.1 and 1https://eda.mmci.uni-saarland.de/disc/
Dataset Splits No The paper evaluates on synthetic and real-world datasets but does not explicitly provide details about train/validation/test splits (e.g., percentages, sample counts, or specific split methodologies) for reproduction.
Hardware Specification Yes We implemented DISC in C++ , ran experiments on a 12-Core Intel Xeon E5-2643 CPU, and report wall-clock time.
Software Dependencies No The paper states 'We implemented DISC in C++' but does not provide specific version numbers for key software components, libraries, or solvers.
Experiment Setup Yes In all experiments we have used the same significance level α = 0.01. and Since DBSCAN relies on hyper-parameter, we optimize ℓ using a grid-search over 7 ϵ-candidates and we do not constraint cluster-sizes.