On the Price of Differential Privacy for Hierarchical Clustering

Authors: Chengyuan Deng, Jie Gao, Jalaj Upadhyay, Chen Wang, Samson Zhou

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We implement our new algorithm HIERARCHICALCLUSTERING-DP and demonstrate the experimental results in this section. First, we evaluate the performance of our algorithm in terms of Dasgupta s objective on synthetic and real-world graphs. Next, we show our algorithm has favorable scalability with large graphs (n 1000).
Researcher Affiliation Academia Rutgers University, EMAIL Texas A&M University, EMAIL
Pseudocode Yes BALANCEDSPARSESTCUT-DP: An ε-DP Algorithm for Balanced Sparsest Cut Input: Graph G = (V, E, w), |V | = n, privacy parameter ε > 0. 1. e E, add 10 log n/ε to the edge weight w(e), denoted as G = (V, E, w ) 2. e E, add independent noise Lap(1/ε) to w (e), denoted as G = (V, E, w ). 3. Run the algorithm from Proposition 3.1 on G and return the cut S.
Open Source Code Yes Our codes are publicly available on the Anonymous Github2. 2https://anonymous.4open.science/r/dp-hc-8612/
Open Datasets Yes we generate two datasets of synthetic graphs from the stochastic block model (SBM) and hierarchical SBM (HSBM), then we select real-world datasets: IRIS, WINE and BOSTON from the scikit-learn package (Pedregosa et al., 2011).
Dataset Splits No The paper does not explicitly mention training/testing/validation dataset splits, only how synthetic graphs were generated and real-world similarity graphs constructed. For synthetic graphs, it mentions '10 different graphs generated by the same set of parameters where p = 0.7 (intra-probability) and q = 0.1 (interprobability)'. For real-world graphs, it describes 'similarity graph is constructed via the Gaussian kernel' but no splits.
Hardware Specification Yes All of our experiments are implemented on a Macbook Pro with M2 CPU.
Software Dependencies No The paper mentions 'scikit-learn package' but does not specify its version number. No other software dependencies are listed with specific versions.
Experiment Setup Yes The results shown in this section are based on 10 different graphs generated by the same set of parameters where p = 0.7 (intra-probability) and q = 0.1 (interprobability). ... As for the choice of ε, the privacy parameter, we test all algorithms on ε {0.01, 0.1, 0.5, 1, 2}.