Auditing Privacy Mechanisms via Label Inference Attacks

Authors: Róbert Busa-Fekete, Travis Dick, Claudio Gentile, Andres Munoz Medina, Adam Smith, Marika Swanberg

NeurIPS 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct a series of experiments on benchmark and synthetic datasets measuring the privacy-utility tradeoff of a number of basic mechanisms
Researcher Affiliation Collaboration Róbert István Busa-Fekete Google Research NY EMAIL Travis Dick Google Research NY EMAIL Claudio Gentile Google Research NY EMAIL Andrés Muñoz Medina Google Research NY EMAIL Adam Smith Boston University & Google Deep Mind EMAIL Marika Swanberg Boston University & Google Research NY EMAIL
Pseudocode No The paper describes mechanisms and algorithms (e.g., Randomized Response, LLP, PROPMATCH) but does not present any pseudocode or clearly labeled algorithm blocks.
Open Source Code Yes Reproducibility. For the sake of full reproducibility of our experimental setting and results, our code is available at the link https://github.com/google-research/google-research/ tree/master/auditing_privacy_via_lia.
Open Datasets Yes We use the click prediction data from the KDD Cup 2012, Track 2 [3]... We also use the Higgs dataset [4]...
Dataset Splits No The paper does not explicitly specify a validation dataset split percentage or sample count. It mentions 'For each dataset, PET, and privacy parameters, we perform a grid search over the learning rate parameter and report the test AUC of the best performing learning rate.' This implies hyperparameter tuning, which typically uses a validation set, but the specific split for validation is not stated.
Hardware Specification Yes We conduct our experiments on a cluster of virtual machines each equipped with a p100 GPU, 16 core CPU, and 16GB of memory.
Software Dependencies No The paper mentions 'minibatch gradient descent with the Adam optimizer [24]' and discusses 'the scikit-learn package' in the NeurIPS checklist answer, but it does not provide specific version numbers for any software components or libraries required for reproducibility.
Experiment Setup Yes For every PET and every value of their privacy parameters, we train the model with each learning rate in {10-6, 5*10-6, 10-5, 10-4, 5*10-4, 10-3, 5*10-3, 10-2}... When training a model on the output of any PET, we always use minibatch gradient descent together with the Adam optimizer [24].