reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Flexible Model Aggregation for Quantile Regression

Authors: Rasool Fakoor, Taesup Kim, Jonas Mueller, Alexander J. Smola, Ryan J. Tibshirani

JMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we provide an extensive suite of empirical comparisons across 34 data sets from two different benchmark repositories.
Researcher Affiliation	Collaboration	Rasool Fakoor EMAIL Amazon Web Services Taesup Kim EMAIL Seoul National University Jonas Mueller EMAIL Cleanlab Alexander J. Smola EMAIL Amazon Web Services Ryan J. Tibshirani EMAIL Amazon Web Services Carnegie Mellon University
Pseudocode	No	The paper describes methods and procedures in narrative text and mathematical formulations but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Code to reproduce our all of experimental results is available at: https://github.com/amazon-research/ quantile-aggregation.
Open Datasets	Yes	We examine 34 data sets in total: 8 from the UCI Machine Learning Repository (Dua and Graff, 2017) and 26 from the Auto ML Benchmark for Regression from the Open ML Repository (Vanschoren et al., 2013).
Dataset Splits	Yes	For each of the 34 data sets studied, we average all results over 5 random train-validation-test splits, of relative size 72% (train), 18% (validation), and 10% (test).
Hardware Specification	No	The paper mentions "deep learning toolkits that can leverage hardware accelerators (GPUs)" but does not specify any particular GPU models, CPU models, or other hardware details used for the experiments.
Software Dependencies	No	The paper mentions using "Adam (Kingma and Ba, 2015)" and "ELU activation function (Clevert et al., 2016)" as well as the "scikit-garden" and "Light GBM" implementations, but does not provide specific version numbers for any of these software components.
Experiment Setup	Yes	We optimize each of these neural network models using Adam (Kingma and Ba, 2015), and using ELU activation function (Clevert et al., 2016). We adaptively vary the mini-batch size depending on the data set size. They also share the same architecture/optimization hyperparameter search space: # of fully connected layers: {2, 3}, # of hidden units: {64, 128}, dropout ratio: {0.0, 0.05, 0.1}, learning rate: {1e-3, 3e-4}, weight decay: {1e-5, 1e-7}. In all settings, we use early stopping where the validation loss is evaluated every epoch and if it has not decreased for the last 500 updates, the optimization is stopped by returning the epoch with the lowest validation loss. For the random forests models, we use the scikit-garden (https://scikit-garden. github.io/) implementation for both, and both have the same hyperparameter search space: minimum # of samples for splitting nodes: {8, 16, 64}, minimum # of sample for leaf nodes: {8, 16, 64} . For the gradient boosting model, the hyperparameter space is: # of leaves: {10, 50, 100}, minimum child samples: {3, 9, 15}, minimum child weight: {1e-2, 1e-1, 1}, subsample ratio: {0.4, 0.6, 0.8}, subsample ratio of columns: {0.4, 0.6}, ℓ1 regularization weight: {1e-1, 1, 5}, ℓ2 regularization weight: {1e-1, 1, 5}. For the global aggregators, the hyperparameter search space we use is: crossing penalty weight λ: {0.5, 1.0, 2.0, 5.0, 10.0}, crossing penalty margin scaling δ0: {1e-1, 5e-2, 1e-2, 1e-3, 1e-4}. For the local aggregators, the hyperparameter search space additionally includes: # of layers: {2, 3}, # of hidden units: {64, 128}, dropout ratio: {0.0, 0.05, 0.1}.