Learning Robust Neural Processes with Risk-Averse Stochastic Optimization

Authors: Huafeng Liu, Yiran Fu, Liping Jing, Hui Li, Shuyang Lin, Jingyue Shi, Deqiang Ouyang, Jian Yu

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To illustrate the superiority of the proposed model, we perform experiments on both synthetic and real-world data, and the results demonstrate that our approach not only helps to achieve more accurate performance but also improves model robustness. 5. Experiments We started with learning predictive functions on synthetic datasets, and high-dimensional tasks, e.g., image completion, Bayesian optimization, and contextual bandits, were performed to evaluate the properties of the NP-related models.
Researcher Affiliation Academia 1Beijing Key Laboratory of Traffic Data Mining and Embodied Intelligence, Beijing, China 2School of Computer Science and Technology, Beijing Jiaotong University, Beijing, China 3State Key Laboratory of Advanced Rail Autonomous Operation, Beijing, China 4Collage of Computer Science, Chongqing University, Chongqing, China. Correspondence to: Deqiang Ouyang, Liping Jing <EMAIL, EMAIL>.
Pseudocode Yes Algorithm 1 Variance-Reduced Stochastic Mirror Prox Algorithm for Tail Task Risk Optimization Input: Risk functions {ℓm(θ)}m [M] related to NPs, epoch number S, iteration numbers {Ks}, learning rates {ηs k}, and weights {ws k}. 1: Initialize parameters (θ, q)0 = (θ0, q0) = arg min(θ,q) Θ Qα ψ((θ, q)) as the starting point. 2: for s = 0 to S 1 do 3: Compute the snapshot (θ, q)s and the mirror snapshot ψ((θ, q)s) according to Eq.(8) and Eq.(9), respectively. 4: Compute the full gradient Fα((θ, q)s) according to Eq. (10). (mini-batch is feasible) 5: for k = 0 to Ks 1 do 6: Compute (θ, q)s k+1/2 according to Eq.(11). 7: For each m [M], Zs k,m U({Zm,i}nm i=1). 8: Compute the variance-reduced stochastic gradient estimator gs k defined in Eq.(13). 9: Compute (θ, q)s k+1 according to Eq.(14). 10: end for 11: Set (θ, q)s+1 0 = (θ, q)s Ks. 12: end for 13: Return (θ, q)S according to Eq. (15).
Open Source Code No The paper does not provide an explicit statement about open-sourcing the code or a link to a code repository.
Open Datasets Yes We perform experiments on both synthetic and real-world data, and the results demonstrate that our approach not only helps to achieve more accurate performance but also improves model robustness. 5.1. Image Completion Following (Kim et al., 2019), we compared the models on image completion tasks on Celeb A (Liu et al., 2015) and EMNIST (Cohen et al., 2017), where each image is downsampled to 32x32.
Dataset Splits Yes For image completion experiments on EMNIST and Celeb A dataset, ... we take random pixels of a given image at training as targets, and select a subset of this as contexts, again choosing the number of contexts and targets randomly (n U[3, 200], m n + U[0, 200 n]).
Hardware Specification Yes D.1. Infrastructure We implement our model with Pytorch, and conduct our experiments with: CPU: Intel Xeon Silver 4316. GPU: 8x Ge Force RTX 4090. RAM: DDR4 384GB. ROM: 16TB 7.2K 6Gb SATA and 1x 960G SATA 6Gb R SSD. Operating system: Ubuntu 18.04 LTS.
Software Dependencies Yes D.1. Infrastructure ... Environments: Python 3.7; Num Py 1.18.1; Sci Py 1.2.1; scikit-learn 0.23.2; seabornn 0.1; torch geometric 1.6.1; matplotlib 3.1.3; dgl 0.4.2; pytorch 1.6.
Experiment Setup Yes D.2.1. 1D REGRESSION For synthetic 1D regression experiments, the neural architectures for CNP, NP, ANP, BCNP, BNP, BANP, and our SCNP/SNP/SANP refer to Appendix C. The number of hidden units is dh = 128 and latent representation dz = 128. The number of layers are le = lde = lla = lqk = lv = 2. ... We trained all models for 100, 000 steps with each step computing updates with a batch containing 100 tasks. We used the Adam optimizer with an initial learning rate 5x10^-4 and decayed the learning rate using Cosine annealing scheme for baselines. For SCNP/SNP/SANP, we set K = 3.