reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

A Stochastic Polynomial Expansion for Uncertainty Propagation through Networks

Authors: Songhan Zhang, ShiNung Ching

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To benchmark PTPE for uncertainty estimation in neural networks, we trained 9 residual neural networks (He et al. (2016)) with three depths (13, 33, and 65 layers) and 3 three typical nonlinearities (Tanh, Re LU, GELU) on CIFAR10 (Krizhevsky (2009)). We corrupted each input image with additive Gaussian noise to simulate noise in low light conditions (first type of corruption in Hendrycks and Dietterich (2019)), then compared the PTPE-predicted and reference (via 107 Monte Carlo sampling) logits distributions. Four levels of corruption, with noise variance values of [1, 10, 100, 1000], were applied to RBG values ([0, 255]) of the input image. If z-scored, the corresponding noise variance scales are [1e-5, 1e-4. 1e-3, 1e-2]. The visualization of the corrupted images are shown in (Fig. 2). The layerwise application of PTPE is outlined in Algorithm 1 with accompanying pseudo code. We measure the estimation accuracy of moments in three ways: the Euclidean distance from the reference mean to the predicted mean, \|\|µest µref\|\|2, the Frobenius norm of the covariance residuals \|\|Σest Σref\|\|fro, and the 2-Wasserstein distance (or Kantorovich-Rubinstein metric) between the reference and estimated distributions, assuming both distributions were Gaussian. This 2-Wasserstein distance is defined as \|\|µest µref\|\|2 2 + trace Σest + Σref 2 Σ1/2 est ΣrefΣ1/2 est 1/2. We summarize the results in Fig. 3, 9, and 10. Overall, the experimental results align with expectations: (1) Jacobian linearization degrades dramatically in moderate to high variance regime. (2) Direct derivation is not suitable for this task due to the assumption of independence, since the overlapping convolution kernels and residual layers introduce substantial correlation. (3) Introducing up-to the third order PTPE typically outperformes stochastic and Jacobian linearization by a large margin.
Researcher Affiliation	Academia	Songhan Zhang EMAIL Department of Electrical and Systems Engineering Washington University in St. Louis; Shi Nung Ching EMAIL Department of Electrical and Systems Engineering Washington University in St. Louis
Pseudocode	Yes	The layerwise application of PTPE is outlined in Algorithm 1 with accompanying pseudo code. ... Algorithm 1 Propagating a multi-variate Gaussian distribution through a pretrained Res Net
Open Source Code	Yes	All code for reproducing the experiments and figures is publicly available at https://github.com/songhanz/Stochastic_Polynomial_Expansion.
Open Datasets	Yes	To benchmark PTPE for uncertainty estimation in neural networks, we trained 9 residual neural networks (He et al. (2016)) with three depths (13, 33, and 65 layers) and 3 three typical nonlinearities (Tanh, Re LU, GELU) on CIFAR10 (Krizhevsky (2009)). ...conducting regression experiments on eight UCI datasets. ...train the model to reconstruct MNIST handwritten digits (Le Cun et al., 1998). ...out-of-distribution (OOD) detection in MNIST. Here, we test how models trained with PTPE respond to rotated and OOD images, using Fashion MNIST (Xiao et al., 2017) as OOD data.
Dataset Splits	Yes	We randomly set aside 10% of the data as test samples, and the error bars reflect the results from 20 random splits.
Hardware Specification	No	The paper mentions the general use of "GPUs" for tensor calculation but does not specify any particular GPU models or configurations used for their experiments.
Software Dependencies	No	The paper mentions MATLAB and SciPy as tools used for specific derivations but does not provide a comprehensive list of software dependencies with specific version numbers for the experimental setup or model training.
Experiment Setup	Yes	To benchmark PTPE for uncertainty estimation in neural networks, we trained 9 residual neural networks (He et al. (2016)) with three depths (13, 33, and 65 layers) and 3 three typical nonlinearities (Tanh, Re LU, GELU) on CIFAR10 (Krizhevsky (2009)). We corrupted each input image with additive Gaussian noise to simulate noise in low light conditions... Four levels of corruption, with noise variance values of [1, 10, 100, 1000]... Following the methodology suggested by Hernández-Lobato and Adams (2015), we search over MLPs with up to four layers containing 50 hidden units (100 for the larger Protein Structure dataset) and report the best test performance.