reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Data-Centric Defense: Shaping Loss Landscape with Augmentations to Counter Model Inversion

Authors: Si Chen, Feiyang Kang, Nikhil Abhyankar, Ming Jin, Ruoxi Jia

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our approach against state-of-the-art MI attacks and demonstrate its effectiveness and robustness across various model architectures and datasets. Specifically, in standard face recognition benchmarks, we reduce face reconstruction success rates to 5%, while maintaining high utility with only a 2% classification accuracy drop, significantly surpassing state-of-the-art model-centric defenses. This is the first study to propose a data-centric approach for mitigating model inversion attacks, showing promising potential for decentralized privacy protection.
Researcher Affiliation	Academia	Si Chen EMAIL Virginia Tech Feiyang Kang EMAIL Virginia Tech Nikhil Abhyankar EMAIL Virginia Tech Ming Jin EMAIL Virginia Tech Ruoxi Jia EMAIL Virginia Tech
Pseudocode	Yes	We refer to the complete injection process as DCD. The pseudocode is provided in Algorithm 1.1
Open Source Code	Yes	Our code is available at https://github.com/SCccc21/DCD.git.
Open Datasets	Yes	Datasets and Models. We demonstrate the efficacy of DCD across multiple tasks and datasets that are commonly employed in previous studies on MI attacks (Zhang et al., 2020b; Struppek et al., 2022; An et al., 2022; Chen et al., 2021): (1) Traffic Sign Recognition (GTSRB (Stallkamp et al., 2011)); (2) Face Recognition (Celeb A (Liu et al., 2015), Face Scrub (Ng & Winkler, 2014)); and (3) Dog Classification (St.Dogs (Khosla et al., 2011)).
Dataset Splits	Yes	Celeb A A large-scale dataset consisting of 202,599 images of 10,177 different celebrities of the size 178x218. We further crop the images by a face factor of 0.65 7 and resize the images to 224x224. We are using the 1000 most frequent celebrity faces (identities with the most number of samples) as a part of our dataset which constitutes of 27,034 training samples and 3,004 test samples.
Hardware Specification	Yes	The experiments were carried out on one server having eight NVIDIA RTX A6000 GPUs with CUDA 12.1.
Software Dependencies	Yes	We implemented DCD to defend against the existing MI Attacks for multiple models and datasets in Python 3.9.12 using Py Torch version 1.12.1.
Experiment Setup	Yes	In our main evaluation, we fix ϵ1 = 8/255, ϵ2 = 0.003, and π2 = 1. Sensitivity analysis of defense performance to ϵ2, π1, and π2 are presented in Section 4.4. Table 7: Privacy Parameters in DP-SGD, MID and BIDO. Attack Method MID DP BIDO β σ δ C λx λy GMI 0.2 1.0 1e 4 1.0 1.0 0.7 PPA 0.07 0.1 4e 5 1.0 0.05 0.1 MIRROR 0.003 2.0 5e 4 1.0 4.0 20.0 PLG-MI 0.02 0.01 4e 5 1.0 0.1 2.0