reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Calibrated Multiple-Output Quantile Regression with Representation Learning

Authors: Shai Feldman, Stephen Bates, Yaniv Romano

JMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments conducted on both real and synthetic data show that our method constructs regions that are significantly smaller compared to existing techniques.
Researcher Affiliation	Academia	Shai Feldman EMAIL Department of Computer Science Technion Israel Institute of Technology Technion City, Haifa 32000, Israel Stephen Bates EMAIL Departments of Electrical Engineering and Computer Science and of Statistics University of California, Berkeley Berkeley, CA 94720, USA Yaniv Romano EMAIL Departments of Electrical and Computer Engineering and of Computer Science Technion Israel Institute of Technology Technion City, Haifa 32000, Israel
Pseudocode	Yes	Algorithm 1: Spherically transformed DQR (ST-DQR) Algorithm 2: Calibrating Multivariate Quantile Regression
Open Source Code	Yes	Software implementing the proposed method and reproducing our experiments can be found at https://github.com/Shai128/mqr
Open Datasets	Yes	blog_data. Blogfeedback data set. https://archive.ics.uci.edu/ml/datasets/ Blog Feedback. Accessed: January, 2019. bio. Physicochemical properties of protein tertiary structure data set. https: //archive.ics.uci.edu/ml/datasets/Physicochemical+Properties+of+ Protein+Tertiary+Structure. Accessed: January, 2019. house. House sales in king county, USA. https://www.kaggle.com/harlfoxem/ housesalesprediction/metadata. Accessed: July, 2021. meps_19. Medical expenditure panel survey, panel 19. https://meps.ahrq.gov/ mepsweb/data_stats/download_data_files_detail.jsp?cbo Puf Number= HC-181. Accessed: January, 2019.
Dataset Splits	Yes	We split the data sets (both real and synthetic) into a training set (38.4%), calibration (25.6%), validation set (16%) used for early stopping, and a test set (20%) to evaluate performance.
Hardware Specification	Yes	CPU: Intel(R) Xeon(R) E5-2650 v4. GPU: Nvidia TITAN-X, 1080TI, 2080TI. OS: Ubuntu 18.04.
Software Dependencies	No	The optimizer is Adam (Kingma and Ba, 2015), and the batch size is 256 for all methods. using Gurobi solver (Gurobi Optimization, LLC, 2021).
Experiment Setup	Yes	The neural network consists of 3 layers of 64 hidden units, and a leaky Re LU activation function with parameter 0.2. The learning rate used is 1e-3, the optimizer is Adam (Kingma and Ba, 2015), and the batch size is 256 for all methods. The maximum number of epochs is 10000, but the training is stopped early if the validation loss does not improve for 100 epochs, and in this case, the model with the lowest loss is chosen. The number of distinct directions used in each gradient step is 32, and they are taken from a fixed collection of 2048 directions that were sampled once, before the training process. The number of directions used to determine the quantile region belonging is 256, and they are sampled from the same collection of directions.