Differential Privacy Under Class Imbalance: Methods and Empirical Insights
Authors: Lucas Rosenblatt, Yuliia Lut, Ethan Turok, Marco Avella Medina, Rachel Cummings
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we empirically evaluate these privacy-preserving imbalanced learning methods under various data and distributional settings. We next empirically evaluate the performance of our methods for private binary classification under class imbalanced data using eight datasets from the Imbalanced-learn (Lemaitre et al., 2017) repository. |
| Researcher Affiliation | Academia | 1New York University 2Work completed while Y.L. and E.T. were at Columbia University. 3Columbia University. Correspondence to: Lucas Rosenblatt <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Weighted ERM w/ Objective Perturbation; Algorithm 2 SMOTE(X1, N, k); Algorithm 3 Balancing w/ Private Data Synthesizer; Algorithm 4 Differentially Private SGD (with weighted Cross-Entropy Loss) |
| Open Source Code | No | The paper discusses the source code of a third-party tool or platform that the authors used, but does not provide their own implementation code for the methodology described in this paper. |
| Open Datasets | Yes | We evaluate methods under varying privacy and class imbalance conditions on real datasets from the imblearn (Lemaitre et al., 2017) repository |
| Dataset Splits | No | The paper mentions "10 randomly seeded data splits" but does not provide specific percentages or methodology (e.g., 80/10/10 train/val/test splits). |
| Hardware Specification | Yes | Neural models (GEM and FTTransformer) were trained using an NVIDIA T4 GPU, with ϵ {0.05, 0.1, 0.5, 1.0, 5.0} (privacy budget range following guidance from (Mc Kenna et al., 2022)). |
| Software Dependencies | No | The paper mentions the 'Opacus pytorch library' but does not specify a version number for either Opacus or PyTorch. |
| Experiment Setup | Yes | Private models were trained for 20 epochs, while non-private models were trained for 100 epochs with early stopping. FTTransformer was initialized with default architecture hyper-parameters (dimension=32, depth=6, 8 heads, dropout of 0.1). DP-SGD was performed with the Opacus pytorch library using recommended parameters (Yousefpour et al., 2021). Also, Table 12: Hyperparameters we used when running GEM and their respective descriptions. (k 3, T 100, α 0.67, loss_p 2, lr 1e-4, max_idxs 100, max_iters 100, ema_weights_beta 0.9, embedding_dim 512). |