Position: The Future of Bayesian Prediction Is Prior-Fitted
Authors: Samuel Müller, Arik Reuter, Noah Hollmann, David Rügamer, Frank Hutter
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this position paper, we explore their potential and directions to address their current limitations. ... For technical details of the training procedure for the experiments in this position paper, we refer to Appendix A. ... Appendix A. Experimental Setup ... We show the average standard deviation of 100 datasets sampled from our prior, each normalized by its final standard deviation. ... We show some example trajectories in the Appendix in Figure 4. |
| Researcher Affiliation | Collaboration | 1University of Freiburg, Freiburg, Germany 2Meta, New York (work done at University of Freiburg) 3LMU Munich, Munich, Germany 4Prior Labs 5Munich Center for Machine Learning (MCML), Munich, Germany 6ELLIS Institute Tübingen, Tübingen, Germany. |
| Pseudocode | Yes | Algorithm 1 Latent Prior and Sampling Functions |
| Open Source Code | No | The paper does not provide an explicit statement about releasing source code for the methodology described, nor does it include a link to a code repository. |
| Open Datasets | No | In this work, we argue that training on vast quantities of synthetic data is ideally suited to utilize the rapidly increasing amount of available computational resources for neural network pre-training in these areas. ... For the analysis on the Martingale property in Section 5, we utilized the standard GP proposed by Müller et al. (2022) with an RBF-Kernel... Inputs were uniformly sampled between 0 and 1. ... To illustrate the PFN architecture s limitation in counting duplicated samples, we trained a PFN on random coin flips in Section 6.5. |
| Dataset Splits | No | Training set sizes were uniformly sampled from 1 to 100. This describes the size of the generated datasets, not explicit train/test/validation splits for a fixed dataset. |
| Hardware Specification | No | We are grateful for the computational resources that were available for this research. Specifically, we acknowledge support by the state of Baden-Württemberg through bw HPC and the German Research Foundation (DFG) through grant no INST 39/963-1 FUGG (bw For Cluster NEMO) |
| Software Dependencies | No | A promising first step could be incorporating zero attention (add zero attn in Py Torch; Paszke et al., 2019). ... a highly standardized procedure that could even be compiled to ONNX (developers, 2021). The paper mentions software names but does not specify exact version numbers for key libraries or frameworks used in their experiments. |
| Experiment Setup | Yes | A grid search identified the model with the best final training loss. We searched across 4 and 8 layers, batch sizes of 32 and 64, Adam learning rates of 0.0001, 0.0003, and 0.001, embedding sizes of 128, 256, and 512, and step counts of 100 000, 200 000, and 400 000. |