Privacy Attacks on Image AutoRegressive Models

Authors: Antoni Kowalczuk, Jan Dubiński, Franziska Boenisch, Adam Dziedzic

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To address this gap, we conduct a comprehensive privacy analysis of IARs, comparing their privacy risks to the ones of DMs as reference points. Concretely, we develop a novel membership inference attack (MIA) that achieves a remarkably high success rate in detecting training images (with a True Positive Rate at False Positive Rate = 1% of 86.38% vs. 6.38% for DMs with comparable attacks). We leverage our novel MIA to provide dataset inference (DI) for IARs, and show that it requires as few as 6 samples to detect dataset membership (compared to 200 for DI in DMs), confirming a higher information leakage in IARs. Finally, we are able to extract hundreds of training data points from an IAR (e.g., 698 from VAR-d30).
Researcher Affiliation Academia 1CISPA Helmholtz Center for Information Security, Germany 2Warsaw University of Technology, Poland 3NASK National Research Institute, Poland. Correspondence to: Antoni Kowalczuk <EMAIL>, Jan Dubi nski <EMAIL>, Franziska Boenisch <EMAIL>, Adam Dziedzic <EMAIL>.
Pseudocode No The paper describes methods and procedures in narrative text sections (e.g., '5. Our Methods for Assessing Privacy in IARs') without formal pseudocode blocks or algorithms.
Open Source Code Yes We release the code at https://github.com/sprintml/ privacy_attacks_against_iars for reproducibility.
Open Datasets Yes As these models were trained on Image Net-1k (Deng et al., 2009) dataset, we use it to perform our privacy attacks.
Dataset Splits Yes For MIA and DI, we take 10000 samples from the training set as members and also 10000 samples from the validation set as non-members.
Hardware Specification Yes We use the maximum batch size that fits on a single NVIDIA RTX A4000 48GB GPU, to utilize our hardware to the maximum, in order to ensure a fair comparison.
Software Dependencies No We use torchprofile (tor, 2021) Python library to measure GFLOPs used for generation and training. The paper does not provide specific version numbers for software dependencies or libraries.
Experiment Setup Yes This approach allows us to achieve a remarkably strong performance of 86.38% TPR@FPR=1%. Specifically, we increase the masking ratio from 0.86 (training average) to 0.95. Interestingly, we find that t = 500 is the most discriminative. We increase the noise sampling count from the default 4 used during training to 64. Following Wen et al. (2024), we use SSCD (Pizzi et al., 2022) score to calculate the similarity, and set the threshold τ = 0.75 such that every sample with a similarity τ will be considered as memorized. We perform iterative greedy sampling of the remaining tokens in the sequence for VAR and RAR, and for MAR we sample from the DM batch by batch.