Rethinking Robustness in Machine Learning: A Posterior Agreement Approach
Authors: João B. S. Carvalho, Víctor Jiménez Rodríguez, Alessandro Torcinovich, Antonio Emanuele Cinà, Carlos Cotrini, Lea Schönherr, Joachim M. Buhmann
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We assess the soundness of our measure in controlled environments and through an empirical robustness analysis in two different covariate shift scenarios: adversarial learning and domain generalization. We illustrate the suitability of PA by evaluating several models under different nature and magnitudes of shift, and proportion of affected observations. The results show that PA offers a reliable analysis of the vulnerabilities in learning algorithms across different shift conditions and provides higher discriminability than accuracy-based measures, while requiring no supervision. |
| Researcher Affiliation | Academia | João B. S. Carvalho EMAIL Department of Computer Science, ETH Zurich; Víctor Jiménez Rodríguez EMAIL Department of Computer Science, ETH Zurich; Alessandro Torcinovich EMAIL Faculty of Engineering, Free University of Bozen-Bolzano Department of Computer Science, ETH Zurich; Antonio E. Cinà EMAIL Department of Computer Science, University of Genoa; Carlos Cotrini EMAIL Department of Computer Science, ETH Zurich; Lea Schönherr EMAIL CISPA Helmholtz Center for Information Security; Joachim M. Buhmann EMAIL Department of Computer Science, ETH Zurich |
| Pseudocode | No | The paper describes mathematical derivations and theoretical properties but does not include any explicitly labeled pseudocode or algorithm blocks with structured steps. |
| Open Source Code | Yes | For further technical information, the reader is referred to our code implementation4. 4PA: https://github.com/viictorjimenezzz/pa-metric Experiments: https://github.com/viictorjimenezzz/pa-covariate-shift. |
| Open Datasets | Yes | For the adversarial robustness scenarios, we carry out our experiments with the CIFAR-10 (Krizhevsky et al., 2009) and the Image Net (Deng et al., 2009) datasets, widely adopted in the machine learning security literature as benchmarks for robustness evaluation. ... In this scenario, we conduct our experiments through a modified version of the Diag Vi B-6 dataset (Eulig et al., 2021) that comprises distorted and upsampled coloured images of size 128 128 from the MNIST dataset (Le Cun, 1998). |
| Dataset Splits | Yes | The CIFAR-10 dataset contains 60 000 colour images of 32 32 pixels equally distributed in 10 classes. The analyzed models are trained on the training set (50 000 images), and the PA evaluation is performed on the test set (10 000 images). ... Our final dataset comprises two sets of 40 000 images for training, two sets of 20 000 images for validation, and six sets of 10 000 images for testing. |
| Hardware Specification | No | We attested a relatively, fast optimization of the β parameter, usually in the order of tens of minutes in single GPU even for large dataset (i.e., Image Net). This only mentions "single GPU" without any specific model or other hardware specifications. |
| Software Dependencies | No | The paper mentions using "Adam (Kingma & Ba, 2015) optimization procedure", "Robust Bench library (Croce et al., 2021)", and "Auto Attack library (Croce & Hein, 2020b)". However, no specific version numbers are provided for these software components or libraries. |
| Experiment Setup | Yes | PGD and FMN attacks are run for 1000 steps, while the number of steps in Auto Attack is determined automatically. The β parameter is searched with an Adam (Kingma & Ba, 2015) optimization procedure, run for 500 epochs. ... Adam is run for 1000 epochs to search for the β parameter. ... we used ERM and IRM algorithms to train a Res Net18 model for 50 epochs on Dtrain, using Adam with a learning rate of 10 2. |