reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Expected Pinball Loss For Quantile Regression And Inverse CDF Estimation

Authors: Taman Narayan, Serena Lutong Wang, Kevin Robert Canini, Maya Gupta

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments in Section 5 on simulations and real-world data show that the proposed non-crossing DLNs provide competitive, trust-worthy estimates. ... Section 5.3 Model Architecture Experiments: We start by demonstrating the eﬃcacy of using monotonic DLNs to predict the inverse CDF on simulations, and then on the real data. ... Table 3: Simulations: Quantile MSE and percent crossing violations for τ {0.01, 0.02, . . . , 0.99}. ... Table 5: Real data experiments: Pinball loss on the test set, averaged over τ {0.01, 0.02, . . . , 0.99}.
Researcher Affiliation	Collaboration	Taman Narayan EMAIL Google Research Serena Wang EMAIL Google Research University of California, Berkeley Kevin Canini EMAIL Google Research Maya R. Gupta EMAIL University of Washington
Pseudocode	No	The paper describes mathematical formulations and discusses algorithms like Dykstra's projection algorithm, but it does not contain a dedicated section or figure presenting pseudocode or an algorithm block for the proposed methodology.
Open Source Code	Yes	Code is available at github.com/google-research/google-research/tree/master/quantile_regression.
Open Datasets	Yes	Air Quality: The Beijing Multi-Site Air-Quality dataset from UCI (Zhang et al., 2017) ... Puzzles: ... The anonymized dataset is publicly available at www.mayagupta.org/data/Puzzle Club_Hold Times.csv. ... Wine: We used the Wine Reviews dataset from Kaggle (Bahri, 2018).
Dataset Splits	Yes	Air Quality: ...earlier examples forming a training set of size 252,481, later examples a validation set of size 84,145, and most recent examples a test set of size 84,145. ... Puzzles: The 984 train and 247 validation examples are IID from past data, while the 211 test samples are the most recent samples ... Traﬃc: We used 1,000 examples each for training, validation, and testing ... Wine: The data was split IID with 84,641 examples for training, 12,091 for validation, and 24,184 for testing.
Hardware Specification	No	The paper does not explicitly describe the specific hardware used for running its experiments, such as GPU models, CPU models, or cloud computing instance types. It mentions software frameworks like TensorFlow but not the underlying hardware.
Software Dependencies	Yes	We used Keras models in Tensor Flow 2.2 for the unrestricted DNN comparisons... For DLNs, we used the Tensor Flow Lattice library... For all DNN and DLN experiments, we use the Adam optimizer (Kingma & Ba, 2015) with its default learning rate of 0.001.
Experiment Setup	Yes	All hyperparameters were optimized on validation sets. ... For DNN models, we validated the number of hidden layers and the hidden dimension. For the SQF-DNN, we also validated the number of distribution keypoints. For the smaller DLN models, we used the common two-layer calibrated lattice architecture ... and validated over its number of calibration keypoints and lattice vertices. ... For both DLNs and DNNs, we additionally validated over the number of training epochs. ... The number of calibration keypoints for the piecewise-linear calibration function over τ were tuned between {10, 20, 50, 100}. The number of lattice keypoints for τ was tuned between {2, 3, 5, 7, 10}. Other feature calibration keypoints were tuned between {5, 10, 15, 20}. Step sizes were tuned between {0.001, 0.005, 0.01, 0.05, 0.1} minibatch sizes were tuned between {1000, 10000}. Number of steps was tuned between {100, 1000, 10000}.