On the Price of Differential Privacy for Hierarchical Clustering
Authors: Chengyuan Deng, Jie Gao, Jalaj Upadhyay, Chen Wang, Samson Zhou
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We implement our new algorithm HIERARCHICALCLUSTERING-DP and demonstrate the experimental results in this section. First, we evaluate the performance of our algorithm in terms of Dasgupta s objective on synthetic and real-world graphs. Next, we show our algorithm has favorable scalability with large graphs (n 1000). |
| Researcher Affiliation | Academia | Rutgers University, EMAIL Texas A&M University, EMAIL |
| Pseudocode | Yes | BALANCEDSPARSESTCUT-DP: An ε-DP Algorithm for Balanced Sparsest Cut Input: Graph G = (V, E, w), |V | = n, privacy parameter ε > 0. 1. e E, add 10 log n/ε to the edge weight w(e), denoted as G = (V, E, w ) 2. e E, add independent noise Lap(1/ε) to w (e), denoted as G = (V, E, w ). 3. Run the algorithm from Proposition 3.1 on G and return the cut S. |
| Open Source Code | Yes | Our codes are publicly available on the Anonymous Github2. 2https://anonymous.4open.science/r/dp-hc-8612/ |
| Open Datasets | Yes | we generate two datasets of synthetic graphs from the stochastic block model (SBM) and hierarchical SBM (HSBM), then we select real-world datasets: IRIS, WINE and BOSTON from the scikit-learn package (Pedregosa et al., 2011). |
| Dataset Splits | No | The paper does not explicitly mention training/testing/validation dataset splits, only how synthetic graphs were generated and real-world similarity graphs constructed. For synthetic graphs, it mentions '10 different graphs generated by the same set of parameters where p = 0.7 (intra-probability) and q = 0.1 (interprobability)'. For real-world graphs, it describes 'similarity graph is constructed via the Gaussian kernel' but no splits. |
| Hardware Specification | Yes | All of our experiments are implemented on a Macbook Pro with M2 CPU. |
| Software Dependencies | No | The paper mentions 'scikit-learn package' but does not specify its version number. No other software dependencies are listed with specific versions. |
| Experiment Setup | Yes | The results shown in this section are based on 10 different graphs generated by the same set of parameters where p = 0.7 (intra-probability) and q = 0.1 (interprobability). ... As for the choice of ε, the privacy parameter, we test all algorithms on ε {0.01, 0.1, 0.5, 1, 2}. |