reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Joints in Random Forests

Authors: Alvaro Correia, Robert Peharz, Cassio P. de Campos

NeurIPS 2020 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, we show that our models often outperform common routines to treat missing data, such as K-nearest neighbour imputation, and moreover, that our models can naturally detect outliers by monitoring the marginal probability of input features.
Researcher Affiliation	Collaboration	Alvaro H. C. Correia EMAIL Eindhoven University of Technology Robert Peharz EMAIL Eindhoven University of Technology Cassio de Campos EMAIL Eindhoven University of Technology ... During part of the three years prior to the submission of this work, the authors were affiliated with the following institutions besides TU Eindhoven: Alvaro Correia was a full-time employee at Accenture and Itaú-Unibanco, and affiliated with Utrecht University; Cassio de Campos was affiliated with Queen s University Belfast and Utrecht University; Robert Peharz was affiliated with the University of Cambridge.
Pseudocode	Yes	Algorithm 1: Converting DT to PC (Ge DT).
Open Source Code	No	The paper does not provide an explicit statement about the release of their source code (e.g., 'Our code is available at...') nor does it provide a direct link to a code repository for the implemented methodology. It refers to 'Learn SPN [16]' which is a prominent PC learner, implying its use, but not releasing their specific implementation.
Open Datasets	Yes	We compare the accuracy of the methods in a selection of datasets from the Open ML-CC18 benchmark3 [51] and the wine-quality dataset [33]. ... We repeat a similar experiment with images, where we use the MNIST dataset [27] to ﬁt a Gaussian KDE, a Random Forest and its corresponding Ge F+. We then evaluate these models on different digit datasets, namely Semeion [11] and SVHN [34] (converted to grayscale and 784 pixels)...
Dataset Splits	Yes	Table 1 presents results for 30% of missing values at test time (different percentages are shown in the supp. material), with 95% conﬁdence intervals across 10 repetitions of 5-fold cross-validation. ... We then compute the log-density of unseen data (70/30 train test split) for the two wine types with both models.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running the experiments (e.g., GPU/CPU models, memory specifications).
Software Dependencies	No	The paper mentions software tools like 'Learn SPN' but does not provide specific version numbers for any software dependencies, which would be required for reproducible setup.
Experiment Setup	Yes	In all experiments, Ge F, Ge F(Learn SPN) and the RF share the exact same structure (partition over the feature space) and are composed of 100 trees; including more trees has been shown to yield only marginal gains in most cases [39]. In Ge F(Learn SPN), we run Learn SPN only for leaves with more than 30 samples, defaulting to a fully factorised model in smaller leaves.