Decoding-based Regression
Authors: Xingyou Song, Dara Bahri
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In experimental benchmarks, we find that properly tuned decoding-based regression heads are dataefficient and competitive with regular pointwise heads on tabular regression, yet are also expressive enough to perform against Gaussian mixture heads for density estimation. Our main goals for experiments are to: Demonstrate decoder heads can be effective swap-in replacements to common pointwise regression heads. Establish the density estimation capabilities of the decoding-based head over any distribution over R. Ablate the effect of decoder head size and sequence-specific methods such as error correction on performance. |
| Researcher Affiliation | Industry | Xingyou Song , Dara Bahri Google Deep Mind Equal Contribution. |
| Pseudocode | No | The paper describes methods and procedures in paragraph text and mathematical formulations but does not contain explicitly labeled 'Pseudocode' or 'Algorithm' blocks, nor does it present structured steps in a code-like format. |
| Open Source Code | No | The paper does not provide any explicit statements about releasing source code, nor does it include links to a code repository in the main text or appendices. |
| Open Datasets | Yes | We obtained the datasets from https://github.com/treforevans/uci_datasets which preprocessed and scraped the data from the official UCI website at https://archive.ics.uci.edu/. ... synthetic continuous objectives from the Black-box Optimization Benchmarking (BBOB) suite (Elhara et al., 2019) ... real-world Open ML (Vanschoren et al., 2013) regression tasks from Open ML-CTR23 (Fischer et al., 2023) and AMLB (Gijsbers et al., 2024) ... UCI regression repository (Dua & Graff, 2017) |
| Dataset Splits | Yes | All models were trained with a maximum of 300 epochs. To prevent overfitting, we apply early stopping (patience=5) where the validation split is 0.1 on the training set. ... Avg. NLL ( Std Dev) of test examples on UCI datasets over 10 train-test splits. ... Each point was averaged over 10 training runs over random combinations of datapoints from the original AMLB task s training set. |
| Hardware Specification | Yes | For the vast majority of tabular regression problems, we found that the process of training and tuning only requires at most 20 minutes on a single Nvidia P100 GPU, making the decoder head relatively cheap to use. |
| Software Dependencies | No | The paper mentions various software components and concepts like 'MLP', 'Transformer', 'Adam learning rates', and 'softmax', but it does not specify any version numbers for programming languages, libraries, or frameworks used (e.g., Python, PyTorch, TensorFlow). |
| Experiment Setup | Yes | For all models, we sweeped the encoder (basic MLP with Re LU activation) by varying the number of layers within [2,3,4,5] and hidden units within [256, 512, 2048]. ... All models were trained with a maximum of 300 epochs. To prevent overfitting, we apply early stopping (patience=5) where the validation split is 0.1 on the training set. Adam learning rates were sweeped over [1e-4, 5e-4]. ... Weight decay: [0.0, 0.1, 1.0] ... Base B: [4, 8, 10] Exponent Digit Count E: [1, 2, 4] Mantissa Digit Count M: [2, 4, 8] Transformer size: (3 layers, 128 units, 4 heads) or (1 layer, 32 units, 1 head). ... Base B: [2, 4, 8] Length K: [4, 8, 6] Transformer size: Same as unnormalized decoder. ... Bin Count: [16, 64, 256, 1024, 4096, 16384] ... Mixtures M: [1, 2, 5, 10, 20, 50, 1000] |