Differential Privacy Under Class Imbalance: Methods and Empirical Insights

Authors: Lucas Rosenblatt, Yuliia Lut, Ethan Turok, Marco Avella Medina, Rachel Cummings

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we empirically evaluate these privacy-preserving imbalanced learning methods under various data and distributional settings. We next empirically evaluate the performance of our methods for private binary classification under class imbalanced data using eight datasets from the Imbalanced-learn (Lemaitre et al., 2017) repository.
Researcher Affiliation Academia 1New York University 2Work completed while Y.L. and E.T. were at Columbia University. 3Columbia University. Correspondence to: Lucas Rosenblatt <EMAIL>.
Pseudocode Yes Algorithm 1 Weighted ERM w/ Objective Perturbation; Algorithm 2 SMOTE(X1, N, k); Algorithm 3 Balancing w/ Private Data Synthesizer; Algorithm 4 Differentially Private SGD (with weighted Cross-Entropy Loss)
Open Source Code No The paper discusses the source code of a third-party tool or platform that the authors used, but does not provide their own implementation code for the methodology described in this paper.
Open Datasets Yes We evaluate methods under varying privacy and class imbalance conditions on real datasets from the imblearn (Lemaitre et al., 2017) repository
Dataset Splits No The paper mentions "10 randomly seeded data splits" but does not provide specific percentages or methodology (e.g., 80/10/10 train/val/test splits).
Hardware Specification Yes Neural models (GEM and FTTransformer) were trained using an NVIDIA T4 GPU, with ϵ {0.05, 0.1, 0.5, 1.0, 5.0} (privacy budget range following guidance from (Mc Kenna et al., 2022)).
Software Dependencies No The paper mentions the 'Opacus pytorch library' but does not specify a version number for either Opacus or PyTorch.
Experiment Setup Yes Private models were trained for 20 epochs, while non-private models were trained for 100 epochs with early stopping. FTTransformer was initialized with default architecture hyper-parameters (dimension=32, depth=6, 8 heads, dropout of 0.1). DP-SGD was performed with the Opacus pytorch library using recommended parameters (Yousefpour et al., 2021). Also, Table 12: Hyperparameters we used when running GEM and their respective descriptions. (k 3, T 100, α 0.67, loss_p 2, lr 1e-4, max_idxs 100, max_iters 100, ema_weights_beta 0.9, embedding_dim 512).