On the Robustness of Dataset Inference

Authors: Sebastian Szyller, Rui Zhang, Jian Liu, N Asokan

TMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We then confirm empirically that DI in the black-box setting leads to FPs, with high confidence. We also show that black-box DI suffers from false negatives (FNs): an adversary who has in fact stolen a victim model can avoid detection by regularising their model with adversarial training. We provide empirical evidence that an adversary who steals the victim s dataset itself and adversarially trains a model can evade detection by DI by trading off accuracy of the stolen model. We empirically demonstrate the existence of FPs in a realistic black-box DI setting (Section 3.2.2);
Researcher Affiliation Academia Sebastian Szyller EMAIL Aalto University Rui Zhang EMAIL Zhejiang University Jian Liu EMAIL Zhejiang University N. Asokan EMAIL University of Waterloo & Aalto University
Pseudocode No The paper describes methods and algorithms in prose but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not contain an explicit statement about releasing source code or a link to a code repository for the methodology described in this paper. It mentions using "the official implementation of DI" for some experiments, but this refers to a third-party tool, not the authors' own code.
Open Datasets Yes For the original formulation, e.g. for CIFAR10, CIFAR10-train (50, 000 samples) is used as SV , and CIFAR10-test is used as S0 (10, 000 samples). We use an analogous split for CIFAR100.
Dataset Splits Yes 1) randomly split CIFAR10-train into two subsets (Atrain and Btrain) of 25, 000 samples each; 2) assign SV = Atrain, and train f V using it; 3) continue using CIFAR10-test as S0 (nothing changes), and train f0 using it; 4) g V is trained using the embedding for S0 and the new SV , obtained from the new f V; 5) assign SI = Btrain, independent data of a third-party I, who trains their model f I.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU, GPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions using "projected gradient descent (Madry et al., 2018) (PGD)" and "the official implementation of DI", but it does not specify any software libraries or frameworks with version numbers.
Experiment Setup Yes With the weights initialized to zero, f learns the weights using gradient descent with learning rate 1 until yf(x) is maximized. During adversarial training, each training sample (x, y) is replaced with an adversarial example that is misclassified f A(x + γ) = y. We use projected gradient descent (Madry et al., 2018) (PGD), and we set γ = 10/255 (under l∞).