reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

On the Price of Differential Privacy for Hierarchical Clustering

Authors: Chengyuan Deng, Jie Gao, Jalaj Upadhyay, Chen Wang, Samson Zhou

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We implement our new algorithm HIERARCHICALCLUSTERING-DP and demonstrate the experimental results in this section. First, we evaluate the performance of our algorithm in terms of Dasgupta s objective on synthetic and real-world graphs. Next, we show our algorithm has favorable scalability with large graphs (n 1000).
Researcher Affiliation	Academia	Rutgers University, EMAIL Texas A&M University, EMAIL
Pseudocode	Yes	BALANCEDSPARSESTCUT-DP: An ε-DP Algorithm for Balanced Sparsest Cut Input: Graph G = (V, E, w), \|V \| = n, privacy parameter ε > 0. 1. e E, add 10 log n/ε to the edge weight w(e), denoted as G = (V, E, w ) 2. e E, add independent noise Lap(1/ε) to w (e), denoted as G = (V, E, w ). 3. Run the algorithm from Proposition 3.1 on G and return the cut S.
Open Source Code	Yes	Our codes are publicly available on the Anonymous Github2. 2https://anonymous.4open.science/r/dp-hc-8612/
Open Datasets	Yes	we generate two datasets of synthetic graphs from the stochastic block model (SBM) and hierarchical SBM (HSBM), then we select real-world datasets: IRIS, WINE and BOSTON from the scikit-learn package (Pedregosa et al., 2011).
Dataset Splits	No	The paper does not explicitly mention training/testing/validation dataset splits, only how synthetic graphs were generated and real-world similarity graphs constructed. For synthetic graphs, it mentions '10 different graphs generated by the same set of parameters where p = 0.7 (intra-probability) and q = 0.1 (interprobability)'. For real-world graphs, it describes 'similarity graph is constructed via the Gaussian kernel' but no splits.
Hardware Specification	Yes	All of our experiments are implemented on a Macbook Pro with M2 CPU.
Software Dependencies	No	The paper mentions 'scikit-learn package' but does not specify its version number. No other software dependencies are listed with specific versions.
Experiment Setup	Yes	The results shown in this section are based on 10 different graphs generated by the same set of parameters where p = 0.7 (intra-probability) and q = 0.1 (interprobability). ... As for the choice of ε, the privacy parameter, we test all algorithms on ε {0.01, 0.1, 0.5, 1, 2}.