reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Position: The Future of Bayesian Prediction Is Prior-Fitted

Authors: Samuel Müller, Arik Reuter, Noah Hollmann, David Rügamer, Frank Hutter

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this position paper, we explore their potential and directions to address their current limitations. ... For technical details of the training procedure for the experiments in this position paper, we refer to Appendix A. ... Appendix A. Experimental Setup ... We show the average standard deviation of 100 datasets sampled from our prior, each normalized by its final standard deviation. ... We show some example trajectories in the Appendix in Figure 4.
Researcher Affiliation	Collaboration	1University of Freiburg, Freiburg, Germany 2Meta, New York (work done at University of Freiburg) 3LMU Munich, Munich, Germany 4Prior Labs 5Munich Center for Machine Learning (MCML), Munich, Germany 6ELLIS Institute Tübingen, Tübingen, Germany.
Pseudocode	Yes	Algorithm 1 Latent Prior and Sampling Functions
Open Source Code	No	The paper does not provide an explicit statement about releasing source code for the methodology described, nor does it include a link to a code repository.
Open Datasets	No	In this work, we argue that training on vast quantities of synthetic data is ideally suited to utilize the rapidly increasing amount of available computational resources for neural network pre-training in these areas. ... For the analysis on the Martingale property in Section 5, we utilized the standard GP proposed by Müller et al. (2022) with an RBF-Kernel... Inputs were uniformly sampled between 0 and 1. ... To illustrate the PFN architecture s limitation in counting duplicated samples, we trained a PFN on random coin flips in Section 6.5.
Dataset Splits	No	Training set sizes were uniformly sampled from 1 to 100. This describes the size of the generated datasets, not explicit train/test/validation splits for a fixed dataset.
Hardware Specification	No	We are grateful for the computational resources that were available for this research. Specifically, we acknowledge support by the state of Baden-Württemberg through bw HPC and the German Research Foundation (DFG) through grant no INST 39/963-1 FUGG (bw For Cluster NEMO)
Software Dependencies	No	A promising first step could be incorporating zero attention (add zero attn in Py Torch; Paszke et al., 2019). ... a highly standardized procedure that could even be compiled to ONNX (developers, 2021). The paper mentions software names but does not specify exact version numbers for key libraries or frameworks used in their experiments.
Experiment Setup	Yes	A grid search identified the model with the best final training loss. We searched across 4 and 8 layers, batch sizes of 32 and 64, Adam learning rates of 0.0001, 0.0003, and 0.001, embedding sizes of 128, 256, and 512, and step counts of 100 000, 200 000, and 400 000.