Queries, Representation & Detection: The Next 100 Model Fingerprinting Schemes

Authors: Augustin Godinot, Erwan Le Merrer, Camilla Penzo, Francois Taiani, Gilles Tredan

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our evaluation shows that a simple baseline that we introduce performs on par with existing state-of-the-art fingerprints, which, on the other hand, are much more complex. To uncover the reasons behind this intriguing result, this paper introduces a systematic approach to both the creation of model fingerprinting schemes and their evaluation benchmarks. By dividing model fingerprinting into three core components Query, Representation and Detection (QuRD) we are able to identify 100 previously unexplored QuRD combinations and gain insights into their performance. Finally, we introduce a set of metrics to compare and guide the creation of more representative model stealing detection benchmarks.
Researcher Affiliation Academia 1Universite de Rennes, France, 2Inria, Rennes, France, 3IRISA/CNRS, Rennes, France, 4LAAS/CNRS, Toulouse, France, 5PEReN, Paris, France EMAIL
Pseudocode Yes Our baseline, coined the Anna Karenina Heuristic (AKH), proceeds as follows. First, the victim chooses a negative input: a point x D such that h wrongly classifies x: h(x) = c(x). We write Dh the resulting negative inputs distribution. Then, the victim queries the suspected model h on x. Finally, if h (x) = h(x) the suspected model h is flagged as stolen, otherwise h is deemed benign. Proposition 1. Consider h, h YX two models and ̑ = P (h(x) = c(x)) (resp. ̑ = P (h (x) = c(x))) their accuracy. Let Δ = d H(h, h ) be the relative Hamming distance between h and h and ΔC = P (h(x) = h (x) | h(x) = c(x)). The property test Tb defined by AKH enjoys the following guarantees: If h = h , PD (Tb(h, h ) = 1) = 1 (1) If h = h , PD (Tb(h, h ) = 0) = ΔC Δ (1 ̑ ) The proof of Proposition 1 and the detailed algorithm can be found in the technical appendix.
Open Source Code Yes All the code required to re-run our experiments, implement new benchmarks and evaluate new fingerprints is available online at https://github.com/grodino/QuRD.
Open Datasets Yes Figure 1 displays the True Positive Rate (TPR@5%) of existing fingerprints on two existing benchmarks, Model Reuse (Li et al. 2021) and SACBench (Guan, Liang, and He 2022). Figure 1 demonstrates that the simple baseline that we introduce (gray dashed lines) performs on par with existing stateof-the-art fingerprinting schemes (coloured dots), which are much more complex.
Dataset Splits Yes Figure 1 displays the True Positive Rate (TPR@5%) of existing fingerprints on two existing benchmarks, Model Reuse (Li et al. 2021) and SACBench (Guan, Liang, and He 2022). Figure 1 demonstrates that the simple baseline that we introduce (gray dashed lines) performs on par with existing stateof-the-art fingerprinting schemes (coloured dots), which are much more complex.
Hardware Specification Yes This project was provided with computing AI and storage resources by GENCI at IDRIS thanks to the grant AD011015350 on the supercomputer Jean Zay s V100 partition.
Software Dependencies No The paper mentions open-sourcing a "fingerprinting toolbox" but does not provide specific software names with version numbers for its dependencies or implementation.
Experiment Setup Yes Fingerprint evaluation consists in generating positive and negative model pairs (h, h ), where positive model pairs consist in a victim model h and a model h stolen from h (e.g. through model extraction), while for negative model pairs, h and h are totally unrelated (e.g. trained on a different dataset). A collection of such positive and negative pairs is called benchmark. The True Positive Rate is the proportion of positive pairs (h, h ) that are flagged as positive by the fingerprint. The False Positive Rate (FPR), which is the proportion of negative pairs (h, h ) that are flagged as positive by the fingerprint. The TPR@5% captures the cost to the victim of missing a stolen model while recognizing to the cost of wrongly flagging a model as stolen. All TPR@5% values are averaged over 5 runs with independent random seeds.