Hierarchical Overlapping Clustering on Graphs: Cost Function, Algorithm and Scalability
Authors: Yicheng Pan, Renjie Chen, Pengyu Long, Bingchen Fan
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results demonstrate that this speed-up algorithm outperforms all baseline methods significantly in both effectiveness (across synthetic and real datasets) and scalability. ... In this section, we verify by experiments the effectiveness and scalability of the speed-up version of Algorithm 2, which demonstrates the validity of our cost function as well. |
| Researcher Affiliation | Academia | 1State Key Laboratory of Complex & Critical Software Environment, Beihang University. Correspondence to: Yicheng Pan <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Algorithm for 2-OC Algorithm 2 Algorithm for k-HOC Algorithm 3 Speed-up algorithm for 2-OC |
| Open Source Code | Yes | For the source codes and datasets, please refer to https://github.com/Hardict/HOC. |
| Open Datasets | Yes | For synthetic datasets, we employ the overlapping stochastic block model (OSBM)... For real datasets, we adopt Amazon, Youtube, and DBLP (Yang & Leskovec, 2012b) that are provided by http://snap.stanford.edu/data and are also used in (Orecchia et al., 2022). To show intuitively that our algorithm is able to find out the blurred overlapping area of datasets, we run our 2-OC algorithm on the MNIST dataset (Le Cun et al., 1998) |
| Dataset Splits | No | The paper uses synthetic datasets based on OSBM with specific generation parameters (p1, p2, p3) and real-world datasets (Amazon, Youtube, DBLP, MNIST). It describes how the OSBM is modified and parameters for cluster generation, and for MNIST it specifies selecting pairs of labels and constructing a k-nearest neighbor graph. However, it does not provide explicit train/test/validation splits for any of the datasets used in its experiments. |
| Hardware Specification | Yes | Our experiments were performed on a personal computer equipped with a 2.3GHz quad-core Intel i5 processor with 8GB memory. |
| Software Dependencies | No | The paper does not specify version numbers for any software dependencies (e.g., programming languages, libraries, frameworks) used to implement their proposed methods. It only mentions the HIPR implementation for a baseline method without specifying its version for their own work. |
| Experiment Setup | Yes | In our experiment, we set ℓ= 32. ... In the fist row, p1 = 10 -3, p2 = 5 10 -3, p3 = 0.5. In the second row, p1 = 10 -4, p2 = 2 10 -4, p3 = 5 10 -3. Regard to the last two columns of NMI results, level 1 is the first level that contains the two high-level clusters, and level 2 is the second one that contains the four low-level clusters. Each point is calculated on average over 5 trials, and error bar indicates standard deviation. |