Decoding-based Regression

Authors: Xingyou Song, Dara Bahri

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In experimental benchmarks, we find that properly tuned decoding-based regression heads are dataefficient and competitive with regular pointwise heads on tabular regression, yet are also expressive enough to perform against Gaussian mixture heads for density estimation. Our main goals for experiments are to: Demonstrate decoder heads can be effective swap-in replacements to common pointwise regression heads. Establish the density estimation capabilities of the decoding-based head over any distribution over R. Ablate the effect of decoder head size and sequence-specific methods such as error correction on performance.
Researcher Affiliation Industry Xingyou Song , Dara Bahri Google Deep Mind Equal Contribution.
Pseudocode No The paper describes methods and procedures in paragraph text and mathematical formulations but does not contain explicitly labeled 'Pseudocode' or 'Algorithm' blocks, nor does it present structured steps in a code-like format.
Open Source Code No The paper does not provide any explicit statements about releasing source code, nor does it include links to a code repository in the main text or appendices.
Open Datasets Yes We obtained the datasets from https://github.com/treforevans/uci_datasets which preprocessed and scraped the data from the official UCI website at https://archive.ics.uci.edu/. ... synthetic continuous objectives from the Black-box Optimization Benchmarking (BBOB) suite (Elhara et al., 2019) ... real-world Open ML (Vanschoren et al., 2013) regression tasks from Open ML-CTR23 (Fischer et al., 2023) and AMLB (Gijsbers et al., 2024) ... UCI regression repository (Dua & Graff, 2017)
Dataset Splits Yes All models were trained with a maximum of 300 epochs. To prevent overfitting, we apply early stopping (patience=5) where the validation split is 0.1 on the training set. ... Avg. NLL ( Std Dev) of test examples on UCI datasets over 10 train-test splits. ... Each point was averaged over 10 training runs over random combinations of datapoints from the original AMLB task s training set.
Hardware Specification Yes For the vast majority of tabular regression problems, we found that the process of training and tuning only requires at most 20 minutes on a single Nvidia P100 GPU, making the decoder head relatively cheap to use.
Software Dependencies No The paper mentions various software components and concepts like 'MLP', 'Transformer', 'Adam learning rates', and 'softmax', but it does not specify any version numbers for programming languages, libraries, or frameworks used (e.g., Python, PyTorch, TensorFlow).
Experiment Setup Yes For all models, we sweeped the encoder (basic MLP with Re LU activation) by varying the number of layers within [2,3,4,5] and hidden units within [256, 512, 2048]. ... All models were trained with a maximum of 300 epochs. To prevent overfitting, we apply early stopping (patience=5) where the validation split is 0.1 on the training set. Adam learning rates were sweeped over [1e-4, 5e-4]. ... Weight decay: [0.0, 0.1, 1.0] ... Base B: [4, 8, 10] Exponent Digit Count E: [1, 2, 4] Mantissa Digit Count M: [2, 4, 8] Transformer size: (3 layers, 128 units, 4 heads) or (1 layer, 32 units, 1 head). ... Base B: [2, 4, 8] Length K: [4, 8, 6] Transformer size: Same as unnormalized decoder. ... Bin Count: [16, 64, 256, 1024, 4096, 16384] ... Mixtures M: [1, 2, 5, 10, 20, 50, 1000]