Guided Visual Exploration of Relations in Data Sets

Authors: Kai Puolamäki, Emilia Oikarinen, Andreas Henelius

JMLR 2021 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section we first consider the stability and scalability of the framework presented in this paper. After this, we present examples of how the proposed method is used to explore relations in a data set and to focus on investigating a hypothesis concerning relations in a subset of the data. An open source library implementing the proposed framework, including the code for the experiments presented in this paper, is available from https: //github.com/edahelsinki/corand/. All the experiments were run on a Mac Book Pro laptop with a 3.1 GHz Intel Core i5 processor using R version 3.5.2 (R Core Team, 2018).
Researcher Affiliation Collaboration Kai Puolamäki EMAIL Institute for Atmospheric and Earth System Research Department of Computer Science P.O. Box 68 FI-00014 University of Helsinki, Helsinki, Finland Emilia Oikarinen EMAIL Department of Computer Science P.O. Box 68 FI-00014 University of Helsinki, Helsinki, Finland Andreas Henelius EMAIL OP Financial Group Gebhardinaukio 1, FI-00510 Helsinki, Finland
Pseudocode Yes Appendix A. Algorithm for merging tiles input : Tiling T as an n m data matrix where an element is the ID of the tile to which it belongs, and a tile t = (R, C). output: T + {t} (the tiling in which T is merged with t). 1 S Hash Map; 2 for i R do 3 K T (i, C); 4 if K / keys(S) then 5 S(K) Tuple; 6 S(K)rows {i}; 7 S(K)id unique(T (i, C)); 9 S(K)rows S(K)rows {i}; 12 pmax max(T (R, C)); 13 for K keys(S) do 14 C = {c | T (S(K)rows, c) S(K)id} ; 15 T (S(K)rows, C ) pmax + 1 ; 16 pmax pmax + 1; 18 return T Algorithm 1: Merging a tile t with the tiles in a tiling T . The function Hash Map denotes a hash map. The value in a hash map H associated with a key x is H(x) and keys(H) gives the keys of H. The function Tuple creates a (named) tuple. An element a in a tuple w = (a, b) is accessed as wa. The function unique returns the unique elements of an array.
Open Source Code Yes An open source library implementing the proposed framework, including the code for the experiments presented in this paper, is available from https: //github.com/edahelsinki/corand/
Open Datasets Yes The german socioeconomic data set (Boley et al., 2013; Kang et al., 2016a)3 contains records from 412 German administrative districts. ... 3. Available from http://users.ugent.be/~bkang/software/sica/sica.zip
Dataset Splits No The paper describes perturbing synthetic data for stability analysis by randomly removing rows and adding Gaussian noise but does not provide specific training/test/validation dataset splits for model evaluation or reproduction of results. The subsequent sections describe exploration on the full dataset or subsets based on visual selection.
Hardware Specification Yes All the experiments were run on a Mac Book Pro laptop with a 3.1 GHz Intel Core i5 processor using R version 3.5.2 (R Core Team, 2018).
Software Dependencies Yes All the experiments were run on a Mac Book Pro laptop with a 3.1 GHz Intel Core i5 processor using R version 3.5.2 (R Core Team, 2018).
Experiment Setup Yes A synthetic data set, parametrised by the noise term σ and an integer n is constructed as follows. First, we randomly remove n rows from the data, after which Gaussian noise with variance σ2 is added to the remaining variables, and finally all variables are rescaled to zero mean and unit variance.