reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Decoding-based Regression

Authors: Xingyou Song, Dara Bahri

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In experimental benchmarks, we find that properly tuned decoding-based regression heads are dataefficient and competitive with regular pointwise heads on tabular regression, yet are also expressive enough to perform against Gaussian mixture heads for density estimation. Our main goals for experiments are to: Demonstrate decoder heads can be effective swap-in replacements to common pointwise regression heads. Establish the density estimation capabilities of the decoding-based head over any distribution over R. Ablate the effect of decoder head size and sequence-specific methods such as error correction on performance.
Researcher Affiliation	Industry	Xingyou Song , Dara Bahri Google Deep Mind Equal Contribution.
Pseudocode	No	The paper describes methods and procedures in paragraph text and mathematical formulations but does not contain explicitly labeled 'Pseudocode' or 'Algorithm' blocks, nor does it present structured steps in a code-like format.
Open Source Code	No	The paper does not provide any explicit statements about releasing source code, nor does it include links to a code repository in the main text or appendices.
Open Datasets	Yes	We obtained the datasets from https://github.com/treforevans/uci_datasets which preprocessed and scraped the data from the official UCI website at https://archive.ics.uci.edu/. ... synthetic continuous objectives from the Black-box Optimization Benchmarking (BBOB) suite (Elhara et al., 2019) ... real-world Open ML (Vanschoren et al., 2013) regression tasks from Open ML-CTR23 (Fischer et al., 2023) and AMLB (Gijsbers et al., 2024) ... UCI regression repository (Dua & Graff, 2017)
Dataset Splits	Yes	All models were trained with a maximum of 300 epochs. To prevent overfitting, we apply early stopping (patience=5) where the validation split is 0.1 on the training set. ... Avg. NLL ( Std Dev) of test examples on UCI datasets over 10 train-test splits. ... Each point was averaged over 10 training runs over random combinations of datapoints from the original AMLB task s training set.
Hardware Specification	Yes	For the vast majority of tabular regression problems, we found that the process of training and tuning only requires at most 20 minutes on a single Nvidia P100 GPU, making the decoder head relatively cheap to use.
Software Dependencies	No	The paper mentions various software components and concepts like 'MLP', 'Transformer', 'Adam learning rates', and 'softmax', but it does not specify any version numbers for programming languages, libraries, or frameworks used (e.g., Python, PyTorch, TensorFlow).
Experiment Setup	Yes	For all models, we sweeped the encoder (basic MLP with Re LU activation) by varying the number of layers within [2,3,4,5] and hidden units within [256, 512, 2048]. ... All models were trained with a maximum of 300 epochs. To prevent overfitting, we apply early stopping (patience=5) where the validation split is 0.1 on the training set. Adam learning rates were sweeped over [1e-4, 5e-4]. ... Weight decay: [0.0, 0.1, 1.0] ... Base B: [4, 8, 10] Exponent Digit Count E: [1, 2, 4] Mantissa Digit Count M: [2, 4, 8] Transformer size: (3 layers, 128 units, 4 heads) or (1 layer, 32 units, 1 head). ... Base B: [2, 4, 8] Length K: [4, 8, 6] Transformer size: Same as unnormalized decoder. ... Bin Count: [16, 64, 256, 1024, 4096, 16384] ... Mixtures M: [1, 2, 5, 10, 20, 50, 1000]