Dirichlet Process-Based Robust Clustering Using the Median-of-Means Estimator
Authors: Supratik Basu, Jyotishka Ray Choudhury, Debolina Paul, Swagatam Das
IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Statistical guarantees on an upper bound of clustering error and rigorous assessment through simulated and real datasets suggest the advantages of our proposed method over existing state-of-the-art clustering algorithms. |
| Researcher Affiliation | Academia | 1Department of Statistical Science, Duke University, USA 2H. Milton Stewart School of Industrial and Systems Engineering, Georgia Institute of Technology, USA 3Machine Learning Research Laboratory, ECSU, Indian Statistical Institute, Kolkata, India 4Electronics and Communication Sciences Unit, Indian Statistical Institute, Kolkata, India |
| Pseudocode | Yes | Algorithm 1 Dirichlet Process Clustering using Median-of Means (DP-Mo M) |
| Open Source Code | Yes | Codes can be found at https://github.com/jyotishkarc/DP-Mo M. |
| Open Datasets | Yes | Our first experiment involves implementing the aforementioned techniques on several datasets from the UCI Machine Learning Repository1 and the Compcancer database2. |
| Dataset Splits | No | The paper describes simulation studies where data points are generated and outliers are introduced in stages, and mentions running the randomized algorithm 35 times. However, it does not provide specific training/test/validation dataset splits with percentages, absolute counts, or references to predefined standard splits for reproduction of the experiments. |
| Hardware Specification | Yes | The simulation experiments were conducted using a computer equipped with Intel(R) Core(TM) i3-7020U 2.30GHz processor, 4GB RAM, 64-bit Windows 10 operating system in the R programming language [R Core Team, 2022]. |
| Software Dependencies | Yes | The simulation experiments were conducted using a computer equipped with Intel(R) Core(TM) i3-7020U 2.30GHz processor, 4GB RAM, 64-bit Windows 10 operating system in the R programming language [R Core Team, 2022]. |
| Experiment Setup | Yes | The tuning parameter ε is set to 1. The learning rate η is typically chosen to be the power of 10 which is of the order of the squared maximum pairwise distance in the dataset, or one lower than that i.e. if the maximum squared separation between any two observations in the data is D, then we set η = 10 log10 D/2 or 10 log10 D/2 1 depending on which of these values aids efficient clustering using our proposed method, where represents the ceiling function. |