A Stochastic Polynomial Expansion for Uncertainty Propagation through Networks

Authors: Songhan Zhang, ShiNung Ching

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To benchmark PTPE for uncertainty estimation in neural networks, we trained 9 residual neural networks (He et al. (2016)) with three depths (13, 33, and 65 layers) and 3 three typical nonlinearities (Tanh, Re LU, GELU) on CIFAR10 (Krizhevsky (2009)). We corrupted each input image with additive Gaussian noise to simulate noise in low light conditions (first type of corruption in Hendrycks and Dietterich (2019)), then compared the PTPE-predicted and reference (via 107 Monte Carlo sampling) logits distributions. Four levels of corruption, with noise variance values of [1, 10, 100, 1000], were applied to RBG values ([0, 255]) of the input image. If z-scored, the corresponding noise variance scales are [1e-5, 1e-4. 1e-3, 1e-2]. The visualization of the corrupted images are shown in (Fig. 2). The layerwise application of PTPE is outlined in Algorithm 1 with accompanying pseudo code. We measure the estimation accuracy of moments in three ways: the Euclidean distance from the reference mean to the predicted mean, ||µest µref||2, the Frobenius norm of the covariance residuals ||Σest Σref||fro, and the 2-Wasserstein distance (or Kantorovich-Rubinstein metric) between the reference and estimated distributions, assuming both distributions were Gaussian. This 2-Wasserstein distance is defined as ||µest µref||2 2 + trace Σest + Σref 2 Σ1/2 est ΣrefΣ1/2 est 1/2. We summarize the results in Fig. 3, 9, and 10. Overall, the experimental results align with expectations: (1) Jacobian linearization degrades dramatically in moderate to high variance regime. (2) Direct derivation is not suitable for this task due to the assumption of independence, since the overlapping convolution kernels and residual layers introduce substantial correlation. (3) Introducing up-to the third order PTPE typically outperformes stochastic and Jacobian linearization by a large margin.
Researcher Affiliation Academia Songhan Zhang EMAIL Department of Electrical and Systems Engineering Washington University in St. Louis; Shi Nung Ching EMAIL Department of Electrical and Systems Engineering Washington University in St. Louis
Pseudocode Yes The layerwise application of PTPE is outlined in Algorithm 1 with accompanying pseudo code. ... Algorithm 1 Propagating a multi-variate Gaussian distribution through a pretrained Res Net
Open Source Code Yes All code for reproducing the experiments and figures is publicly available at https://github.com/songhanz/Stochastic_Polynomial_Expansion.
Open Datasets Yes To benchmark PTPE for uncertainty estimation in neural networks, we trained 9 residual neural networks (He et al. (2016)) with three depths (13, 33, and 65 layers) and 3 three typical nonlinearities (Tanh, Re LU, GELU) on CIFAR10 (Krizhevsky (2009)). ...conducting regression experiments on eight UCI datasets. ...train the model to reconstruct MNIST handwritten digits (Le Cun et al., 1998). ...out-of-distribution (OOD) detection in MNIST. Here, we test how models trained with PTPE respond to rotated and OOD images, using Fashion MNIST (Xiao et al., 2017) as OOD data.
Dataset Splits Yes We randomly set aside 10% of the data as test samples, and the error bars reflect the results from 20 random splits.
Hardware Specification No The paper mentions the general use of "GPUs" for tensor calculation but does not specify any particular GPU models or configurations used for their experiments.
Software Dependencies No The paper mentions MATLAB and SciPy as tools used for specific derivations but does not provide a comprehensive list of software dependencies with specific version numbers for the experimental setup or model training.
Experiment Setup Yes To benchmark PTPE for uncertainty estimation in neural networks, we trained 9 residual neural networks (He et al. (2016)) with three depths (13, 33, and 65 layers) and 3 three typical nonlinearities (Tanh, Re LU, GELU) on CIFAR10 (Krizhevsky (2009)). We corrupted each input image with additive Gaussian noise to simulate noise in low light conditions... Four levels of corruption, with noise variance values of [1, 10, 100, 1000]... Following the methodology suggested by Hernández-Lobato and Adams (2015), we search over MLPs with up to four layers containing 50 hidden units (100 for the larger Protein Structure dataset) and report the best test performance.