Understanding the difficulties of posterior predictive estimation
Authors: Abhinav Agrawal, Justin Domke
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our main contribution is a theoretical analysis demonstrating that even with exact inference, SNR can decay rapidly with an increase in (a) the mismatch between training and test data, (b) the dimensionality of the latent space, or (c) the size of test data relative to training data. Through several examples, we empirically verify these claims and show that these factors indeed lead to poor SNR and unreliable PPD estimates (sometimes, estimates are off by hundreds of nats even with a million samples). |
| Researcher Affiliation | Academia | 1Manning College of Information and Computer Sciences, University of Massachusetts, Amherst, MA, USA. Correspondence to: Abhinav Agrawal <EMAIL>. |
| Pseudocode | Yes | Figure 7 provides the pseudocode. Learned IS(D , K) w Optimize(IW-ELBO) zk rw k {1, . . . , K} K PK k=1 p(D |zk)q D(zk) |
| Open Source Code | No | The paper states 'All our code is implemented in JAX (Bradbury et al., 2018)' and 'While we implement our own inference schemes for this paper, we expect the results to be similar if we use the aforementioned libraries.' However, it does not provide an explicit statement of code release or a link to a code repository for the methodology described in this paper. |
| Open Datasets | Yes | Figure 1 shows log PPDq estimates for a user-preference model on the Movie Lens-25M dataset (Harper & Konstan, 2015), with approximate posterior q D produced from variational inference (VI) with either a Gaussian or flow-based family (see section 5.4 for setup). |
| Dataset Splits | Yes | We used a train-test split such that, for each user, one-tenth of the ratings are in the test set. This gives us 18M ratings for training (and 2M ratings for testing). |
| Hardware Specification | Yes | All our code is implemented in JAX (Bradbury et al., 2018) and run on a single NVIDIA A100 GPU. |
| Software Dependencies | No | The paper mentions using 'JAX (Bradbury et al., 2018)', 'ADAM (Kingma & Ba, 2015)', and 'DRe G gradient (Tucker et al., 2019)' but does not provide specific version numbers for these software dependencies. |
| Experiment Setup | Yes | To learn the variational parameters, we optimize standard ELBO using ADAM (Kingma & Ba, 2015) with a learning rate of 0.001 for 10,000 iterations. For each iteration, we use a batch of 16 samples for estimating the DRe G gradient (Tucker et al., 2019). Also, for LIS, 'optimize IW-ELBOM using ADAM (Kingma & Ba, 2015) with a learning rate of 0.001 for 1000 iterations. For each iteration, we use a single sample of the DRe G estimator. We set M = 16 for all our experiments.' |