OmniPred: Language Models as Universal Regressors
Authors: Xingyou Song, Oscar Li, Chansoo Lee, Bangding Yang, Daiyi Peng, Sagi Perel, Yutian Chen
TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | our extensive experiments demonstrate that language models are capable of very precise numerical regression using only textual representations of mathematical parameters and values, and if given the opportunity to train at scale over multiple tasks, can significantly outperform traditional regression models. |
| Researcher Affiliation | Collaboration | Xingyou Song1 , Oscar Li2 , Chansoo Lee1, Bangding (Jeffrey) Yang3, Daiyi Peng1, Sagi Perel1, Yutian Chen1 1Google Deep Mind, 2Carnegie Mellon University, 3Google Equal Contribution. Work performed as a student researcher at Google Deep Mind. |
| Pseudocode | No | The paper describes the methodology in text and mathematical formulations (e.g., Equation 1) but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | We do not release any of the trained checkpoints, as it may be possible to reverse-engineer parts of the training data, which can lead to privacy violations and data leakage. The paper mentions external open-source tools used, like T5X, Open Source Vizier, and Init2Winit, but does not provide specific source code for the Omni Pred methodology itself. |
| Open Datasets | Yes | BBOB (Shifted): For precise controlled experiments where we can generate synthetic datasets and perform online evaluations, we create a multi-task version of the BBOB benchmark (El Hara et al., 2019) containing 24 different synthetic functions |
| Dataset Splits | Yes | (3) deciding on a fixed train/validation/test splitting ratio (default 0.8/0.1/0.1) |
| Hardware Specification | Yes | The model ( 200M parameters) was pretrained using a 4x4 TPU V3. ... we used a single 1x1 TPU V3. |
| Software Dependencies | No | The paper mentions several software components like T5X (Raffel et al., 2020), Sentence Piece tokenizer (Kudo & Richardson, 2018), and XGBoost (Chen & Guestrin, 2016), but does not provide specific version numbers for these tools as used in their experiments. |
| Experiment Setup | Yes | Optimizer: Adafactor with base learning rate 0.01 and square root decay. Batch size 256. ... We use the same settings from pretraining for consistency, but allow a maximum of 30 epochs. ... Single-task training: ...larger constant learning rate of 10-3... Finetuning: ...smaller fixed learning rate of 10-5... We restrict the logits to only decode the custom floating point tokens for representing y-values. To maximize batch size for a 1x1 TPU V3, we generate 64 samples and select the empirical median of these floating point samples as our final prediction when computing prediction error. |