reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Guided Visual Exploration of Relations in Data Sets

Authors: Kai Puolamäki, Emilia Oikarinen, Andreas Henelius

JMLR 2021 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section we ﬁrst consider the stability and scalability of the framework presented in this paper. After this, we present examples of how the proposed method is used to explore relations in a data set and to focus on investigating a hypothesis concerning relations in a subset of the data. An open source library implementing the proposed framework, including the code for the experiments presented in this paper, is available from https: //github.com/edahelsinki/corand/. All the experiments were run on a Mac Book Pro laptop with a 3.1 GHz Intel Core i5 processor using R version 3.5.2 (R Core Team, 2018).
Researcher Affiliation	Collaboration	Kai Puolamäki EMAIL Institute for Atmospheric and Earth System Research Department of Computer Science P.O. Box 68 FI-00014 University of Helsinki, Helsinki, Finland Emilia Oikarinen EMAIL Department of Computer Science P.O. Box 68 FI-00014 University of Helsinki, Helsinki, Finland Andreas Henelius EMAIL OP Financial Group Gebhardinaukio 1, FI-00510 Helsinki, Finland
Pseudocode	Yes	Appendix A. Algorithm for merging tiles input : Tiling T as an n m data matrix where an element is the ID of the tile to which it belongs, and a tile t = (R, C). output: T + {t} (the tiling in which T is merged with t). 1 S Hash Map; 2 for i R do 3 K T (i, C); 4 if K / keys(S) then 5 S(K) Tuple; 6 S(K)rows {i}; 7 S(K)id unique(T (i, C)); 9 S(K)rows S(K)rows {i}; 12 pmax max(T (R, C)); 13 for K keys(S) do 14 C = {c \| T (S(K)rows, c) S(K)id} ; 15 T (S(K)rows, C ) pmax + 1 ; 16 pmax pmax + 1; 18 return T Algorithm 1: Merging a tile t with the tiles in a tiling T . The function Hash Map denotes a hash map. The value in a hash map H associated with a key x is H(x) and keys(H) gives the keys of H. The function Tuple creates a (named) tuple. An element a in a tuple w = (a, b) is accessed as wa. The function unique returns the unique elements of an array.
Open Source Code	Yes	An open source library implementing the proposed framework, including the code for the experiments presented in this paper, is available from https: //github.com/edahelsinki/corand/
Open Datasets	Yes	The german socioeconomic data set (Boley et al., 2013; Kang et al., 2016a)3 contains records from 412 German administrative districts. ... 3. Available from http://users.ugent.be/~bkang/software/sica/sica.zip
Dataset Splits	No	The paper describes perturbing synthetic data for stability analysis by randomly removing rows and adding Gaussian noise but does not provide specific training/test/validation dataset splits for model evaluation or reproduction of results. The subsequent sections describe exploration on the full dataset or subsets based on visual selection.
Hardware Specification	Yes	All the experiments were run on a Mac Book Pro laptop with a 3.1 GHz Intel Core i5 processor using R version 3.5.2 (R Core Team, 2018).
Software Dependencies	Yes	All the experiments were run on a Mac Book Pro laptop with a 3.1 GHz Intel Core i5 processor using R version 3.5.2 (R Core Team, 2018).
Experiment Setup	Yes	A synthetic data set, parametrised by the noise term σ and an integer n is constructed as follows. First, we randomly remove n rows from the data, after which Gaussian noise with variance σ2 is added to the remaining variables, and ﬁnally all variables are rescaled to zero mean and unit variance.