Differentially Private Source-Target Clustering
Authors: Shachar Schnapp, Sivan Sabato
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate in experiments the reduction in clustering cost that is obtained by our practical algorithms compared to baseline approaches. Code is publicly available on https://github.com/Shachar Schnapp/STC. [...] We ran our algorithms on synthetic and real-world datasets, and compared them to the following ε-DP baselines: |
| Researcher Affiliation | Academia | Shachar Schnapp EMAIL Department of Computer Science, Ben-Gurion University of the Negev Sivan Sabato EMAIL Department of Computing and Software, Mc Master University Canada CIFAR AI Chair, Vector Institute Department of Computer Science, Ben-Gurion University of the Negev |
| Pseudocode | Yes | Algorithm 1 Noisy Average Set (NAS) [...] Algorithm 2 Neighbor Noisy Averages (NNA) |
| Open Source Code | Yes | Code is publicly available on https://github.com/Shachar Schnapp/STC. [...] The python code is publicly available on https://github.com/Shachar Schnapp/STC. |
| Open Datasets | Yes | We ran our algorithms on synthetic and real-world datasets [...] MNIST (Deng, 2012) contains 70,000 grayscale images of handwritten digits. [...] Office (Saenko et al., 2010) contains images of office items from different sources: [...] Superconductivity (Hamidieh, 2018) is an 82-dimensional dataset of 16,000 superconducting materials. |
| Dataset Splits | Yes | We tested three (source,target) pairs of digits: (1,7), (5,2) and (9, 6). [...] We split the dataset into four subsets termed low (l), middle-low (ml), middle-high (mh) and high (h) of around 4000 instances each. We tested all possible (source, target) pairs. |
| Hardware Specification | Yes | All run times were measured when running on one core of an Intel i9-9900K CPU and NVIDIA GEFORCE RTX 2080 Ti GPU. |
| Software Dependencies | No | The paper mentions 'python code' and refers to specific algorithms like 'Accelerated K-medoids (Ak M)' but does not provide specific version numbers for any software libraries or dependencies. |
| Experiment Setup | Yes | For Alg. 1, we tested several values of t and got similar results, thus we provide below results for t = 150, and report the others in Appendix C.1. [...] We fixed ε = 3 for DP and ρ = 3 for z CDP. For each dataset, algorithm and k, we averaged Cost(T , S, Tk) over 30 runs. [...] In all of the experiments, the points were normalized to have a maximal norm of 1/2. |