reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Linear Mode Connectivity in Differentiable Tree Ensembles

Authors: Ryuichi Kanoh, Mahito Sugiyama

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our results, which are highlighted with a green line in the top left panel of Figure 1, clearly show that the answer to our research question is Yes. This plot shows the variation in test accuracy when interpolating weights of soft oblivious trees... We empirically evaluate barriers in soft tree ensembles to examine LMC.
Researcher Affiliation	Academia	1National Institute of Informatics 2The Graduate University for Advanced Studies, SOKENDAI EMAIL
Pseudocode	Yes	We present the straightforward matching procedure in Algorithms 1 and 2.
Open Source Code	Yes	The reproducible Py Torch (Paszke et al., 2019) implementation is provided in the supplementary material.
Open Datasets	Yes	In our experiments, we employed Tabular-Benchmark (Grinsztajn et al., 2022), a collection of tabular datasets suitable for evaluating tree ensembles. Details of datasets are provided in Section A in Appendix. [...] Table 5: Summary of the datasets used in the experiments. Dataset N F Link (with OpenML links for each dataset)
Dataset Splits	Yes	As proposed in Grinsztajn et al. (2022), we randomly sampled 10, 000 instances for train and test data from each dataset. If the dataset contains fewer than 20, 000 instances, they are randomly divided into halves for train and test data. [...] Furthermore, we conducted experiments with split data following the protocol in Ainsworth et al. (2023) and Jordan et al. (2023), where the initial split consists of randomly sampled 80% negative and 20% positive instances, and the second split inverts these ratios.
Hardware Specification	Yes	All experiments were conducted on a system equipped with an Intel Xeon E5-2698 CPU at 2.20 GHz, 252 GB of memory, and Tesla V100-DGXS-32GB GPU, running Ubuntu Linux (version 4.15.0-117-generic).
Software Dependencies	No	The paper mentions 'Py Torch (Paszke et al., 2019)', 'Adam (Kingma & Ba, 2015)', and 'scipy. optimize. linear_sum_assignment' but does not provide specific version numbers for these software components. It only specifies the operating system version: 'Ubuntu Linux (version 4.15.0-117-generic)'.
Experiment Setup	Yes	We used three different learning rates η {0.01, 0.001, 0.0001} and adopted the one that yields the highest training accuracy for each dataset. The batch size is set at 512. [...] During training, we minimized cross-entropy using Adam (Kingma & Ba, 2015) with its default hyperparameters. Training is conducted for 50 epochs. To measure the barrier using Equation 1, experiments were conducted by interpolating between two models with λ {0, 1/24, . . . , 23/24, 1}.