reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Flexible Bayesian Nonlinear Model Configuration

Authors: Aliaksandr Hubin, Geir Storvik, Florian Frommlet

JAIR 2021 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In various applications, we illustrate how our approach is used to obtain meaningful nonlinear models. Additionally, we compare its predictive performance with several machine learning algorithms. Section 5: Applications, includes specific examples such as 'Binary Classiﬁcation' (breast cancer and spam), 'Prediction of Metric Outcome' (abalone shell age prediction), and 'Model Inference' (Kepler's law, epigenetic data). The paper presents numerous tables (e.g., Table 2, 3, 5, 8, 10, 12, 13, 14, 15, 16, 17, 18) detailing comparative performance metrics (ACC, FPR, FNR, RMSE, MAE, CORR) across different algorithms and settings.
Researcher Affiliation	Academia	Aliaksandr Hubin from Department of Mathematics, University of Oslo and Norwegian Computing Center. Geir Storvik from Department of Mathematics, University of Oslo. Florian Frommlet from CEMSIIS, Medical University of Vienna. All listed institutions (University of Oslo, Medical University of Vienna, and Norwegian Computing Center) are academic or public research institutions, and the email domains (`.uio.no`, `.meduniwien.ac.at`) are consistent with academic affiliations.
Pseudocode	Yes	Algorithm 1: MJMCMC, one iteration from current model m. Algorithm 2: GMJMCMC
Open Source Code	Yes	R package: R package EMJMCMC for doing inference in the BGNLM model (R)(G) MJMCMC (Hubin et al., 2021b) which is available on Git Hub at http://aliaksah.github. io/EMJMCMC2016. Data and code: supplementary data and code for all the examples as well as the excel sheets for all of the results are given in Hubin et al. (2021a) which is available on Git Hub at https://github.com/aliaksah/EMJMCMC2016/tree/master/supplementaries/BGNLM.
Open Datasets	Yes	Breast cancer data can be downloaded from https: //archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic). Spam emails data can be downloaded from https://archive.ics.uci.edu/ml/datasets/ spambase. The Abalone data set (Nash, Sellers, Talbot, Cawthorn, & Ford, 1994), downloaded from https://archive.ics.uci.edu/ml/datasets/Abalone. Exoplanet data were originally collected and continues to be updated by Hanno Rein at the Open Exoplanet Catalogue Git Hub (https://github. com/Open Exoplanet Catalogue/) repository (Rein, 2016). Epigenetic data was obtained from the NCBI GEO archive (Barrett et al., 2013). NEO objects data can be downloaded from 2016.spaceappschallenge.org.
Dataset Splits	Yes	Breast cancer data: A randomly selected quarter of the images was used as a training data set, the remaining images were used as a test set. Spam classification data: The data were randomly divided into a training set of 1536 emails and a test set of the remaining 3065 emails. Abalone shell age prediction: a total of 4 177 observations are present, of which 3 177 randomly chosen observations were used for training and the remaining 1000 observations were used for testing for all of the compared approaches. NEO Asteroids Classiﬁcation: The training sample consisted of n = 64 objects (32 of which are potentially hazardous objects, whilst the other 32 are not) and the test sample of the remaining np = 20 702 objects.
Hardware Specification	No	The paper mentions: 'Last but not least, we also acknowledge the HPC cluster from sigma2.no for providing us with the computational resources used for obtaining the results of this paper.' This statement refers to a high-performance computing cluster but does not provide any specific details about the CPU, GPU, or memory used.
Software Dependencies	No	The paper states: 'The corresponding R libraries, functions, and their tuning parameter settings are described in supplementary scripts.' and 'We used the same data to identify the underlying mathematical expression using the Symbolic Regressor routine within the Python library gplearn (https://gplearn.readthedocs.io/en/stable/).' While it mentions R libraries and the Python library gplearn, it does not specify any version numbers for these software components, which is required for a reproducible description of dependencies.
Experiment Setup	Yes	The paper provides extensive details on the experimental setup, including hyperparameters and configuration settings for various applications. For instance, in the breast cancer example: 'we used D = 6, L = 20 and Q = 20 for BGNLM' and 'The Bayesian model uses the model structure prior (3) with a = e^-2. The set of nonlinear transformations is deﬁned as G = {gauss(x), tanh(x), atan(x), sin(x)}'. For the spam classification, 'the hyper-parameters Q and L and the population size of the GMJMCMC algorithm, which are all set to 100'. For abalone prediction: 'The set of nonlinear transformations is now G = {sigmoid(x), exp( \|x\|), log(\|x\| + 1), \|x\|1/3, \|x\|5/2, \|x\|7/2} where selection probabilities PG are again uniform. Furthermore, the restrictions D = 6 for the depth, L = 15 for the local width and Q = 15 for the maximum number of features per model are applied. We present results both for a = e^-2 and a = e^-log n in the prior on model structures'.