The Hyperspherical Geometry of Community Detection: Modularity as a Distance

Authors: Martijn Gösgens, Remco van der Hofstad, Nelly Litvak

JMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In Section 6, we compare a number of projection methods on real-world networks and interpret the results in our geometric framework. In this section, we perform more experiments to demonstrate how the geometric results reported in this paper help to interpret and understand outcomes of the projection method for different query mappings. In Section 6.1, we illustrate some phenomena that we empirically observe for many networks and query mappings, while in Section 6.2, we compare the performance of different query mappings on several real-world networks.
Researcher Affiliation Academia Martijn Gösgens EMAIL Eindhoven University of Technology, Eindhoven, Netherlands Remco van der Hofstad EMAIL Eindhoven University of Technology, Eindhoven, Netherlands Nelly Litvak EMAIL University of Twente, Enschede, Netherlands Eindhoven University of Technology, Eindhoven, Netherlands
Pseudocode No The paper describes the Louvain algorithm conceptually but does not provide it in a structured pseudocode block or algorithm listing. For example, it states: "The Louvain algorithm is a greedy algorithm that is initialized at the clustering corresponding to the fine pole 1, it follows that L(q) cannot be further from q than 1" but this is a description, not pseudocode.
Open Source Code Yes The code that was used to perform the experiments and generate the figures of this paper is available on Git Hub.5 Amongst others, this repository contains a Python implementation of our modification of the Louvain algorithm. 5. See https://github.com/MartijnGosgens/hyperspherical_community_detection.
Open Datasets Yes We consider 6 real-world networks from Prokhorenkova and Tikhonov (2019) that have known ground truth communities. These networks are: 1) Zachary s well-known karate club network (Zachary, 1977); 2) Lusseau s network of bottlenose dolphins (Lusseau and Newman, 2004); 3) a network of political books grouped by political affiliation (Newman, 2006b); 4) a network of college football teams grouped by conference (Girvan and Newman, 2002); 5) the EU-core network of European researchers linked by email traffic and grouped by department; and 6) a network of political blogs concerning the US presidential election of 2004, grouped by party affiliation (Adamic and Glance, 2005). In this experiment, we generate random networks using the LFR benchmark (Lancichinetti et al., 2008)
Dataset Splits No The paper uses real-world networks with 'known ground truth communities' and generates synthetic networks (Planted Partition Model, LFR benchmark), which have inherent community structures. However, it does not specify any explicit training, test, or validation splits for these datasets. For example, for the PPM, it says: "The ground truth clustering consists of 20 communities, each of size 20, while the mean intraand inter-community degrees are 6 and 4, respectively" but this describes the dataset generation, not how it was split for a typical machine learning experiment.
Hardware Specification No The paper discusses computational cost and running times for experiments, especially for different power-law exponents. It mentions: "We expect that implementing the projection method for the wedge vector in a similar optimized way would result in a similar improvement in running time." and "We emphasize that we have implemented our modification of Louvain in Python, which is significantly slower than other programming languages (e.g., C++) for such tasks." However, it does not specify any concrete hardware details such as CPU, GPU, or memory used for these computations.
Software Dependencies No The paper mentions: "This repository contains a Python implementation of our modification of the Louvain algorithm." and "For example, the Networ Kit (Staudt et al., 2016) implementation of the Louvain algorithm for maximizing the Newman-Girvan modularity on the LFR network of 100,000 vertices with τ = 4 takes only 0.46 seconds." It names Python and NetworKit but does not specify any version numbers for these or other critical libraries.
Experiment Setup Yes For an illustrational experiment, we use query vectors on the ER meridian to detect communities in a Planted Partition Model (PPM). The ground truth clustering consists of 20 communities, each of size 20, while the mean intraand inter-community degrees are 6 and 4, respectively. The Louvain algorithm is also known to efficiently optimize other partition-based functions (Prokhorenkova and Tikhonov, 2019). In this work, we use the Louvain algorithm to minimize the angular distance to a query vector. The Louvain algorithm is a greedy algorithm that is initialized at the clustering corresponding to the fine pole 1