reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Differential Privacy Under Class Imbalance: Methods and Empirical Insights

Authors: Lucas Rosenblatt, Yuliia Lut, Ethan Turok, Marco Avella Medina, Rachel Cummings

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we empirically evaluate these privacy-preserving imbalanced learning methods under various data and distributional settings. We next empirically evaluate the performance of our methods for private binary classification under class imbalanced data using eight datasets from the Imbalanced-learn (Lemaitre et al., 2017) repository.
Researcher Affiliation	Academia	1New York University 2Work completed while Y.L. and E.T. were at Columbia University. 3Columbia University. Correspondence to: Lucas Rosenblatt <EMAIL>.
Pseudocode	Yes	Algorithm 1 Weighted ERM w/ Objective Perturbation; Algorithm 2 SMOTE(X1, N, k); Algorithm 3 Balancing w/ Private Data Synthesizer; Algorithm 4 Differentially Private SGD (with weighted Cross-Entropy Loss)
Open Source Code	No	The paper discusses the source code of a third-party tool or platform that the authors used, but does not provide their own implementation code for the methodology described in this paper.
Open Datasets	Yes	We evaluate methods under varying privacy and class imbalance conditions on real datasets from the imblearn (Lemaitre et al., 2017) repository
Dataset Splits	No	The paper mentions "10 randomly seeded data splits" but does not provide specific percentages or methodology (e.g., 80/10/10 train/val/test splits).
Hardware Specification	Yes	Neural models (GEM and FTTransformer) were trained using an NVIDIA T4 GPU, with ϵ {0.05, 0.1, 0.5, 1.0, 5.0} (privacy budget range following guidance from (Mc Kenna et al., 2022)).
Software Dependencies	No	The paper mentions the 'Opacus pytorch library' but does not specify a version number for either Opacus or PyTorch.
Experiment Setup	Yes	Private models were trained for 20 epochs, while non-private models were trained for 100 epochs with early stopping. FTTransformer was initialized with default architecture hyper-parameters (dimension=32, depth=6, 8 heads, dropout of 0.1). DP-SGD was performed with the Opacus pytorch library using recommended parameters (Yousefpour et al., 2021). Also, Table 12: Hyperparameters we used when running GEM and their respective descriptions. (k 3, T 100, α 0.67, loss_p 2, lr 1e-4, max_idxs 100, max_iters 100, ema_weights_beta 0.9, embedding_dim 512).