reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Modularity aided consistent attributed graph clustering via coarsening

Authors: Yukti Makhija, Samarth Bhatia, Manoj Kumar, Sandeep Kumar

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on benchmark datasets demonstrate its superiority over existing state-of-the-art methods for both attributed and non-attributed graphs. 5 Experiments
Researcher Affiliation	Academia	Samarth Bhatia* EMAIL Indian Institute of Technology, Delhi Yukti Makhija* EMAIL Indian Institute of Technology, Delhi Manoj Kumar EMAIL LNM Institute of Technology, Jaipur Sandeep Kumar EMAIL Indian Institute of Technology, Delhi
Pseudocode	Yes	Algorithm 1 Q-MAGC Algorithm Require: G(X, Θ), α, β, γ, λ 1: t 0 2: while Stopping Criteria not met do 3: Update Ct+1 as in Equation 11 4: Update Xt+1 C as in Equation 14 5: t t + 1 6: end while 7: return Ct, Xt C
Open Source Code	Yes	A Implementation The implementations for all the experiments can be found at https://github.com/plutonium-239/MAGC.
Open Datasets	Yes	We evaluate our method on a diverse set of datasets, including small attributed datasets like Cora and Cite Seer, larger datasets like Pub Med, and unattributed datasets such as Airports (Brazil, Europe, and USA). A summary of these can be seen in Table 1. Additionally, we test our method on very large datasets like Coauthor CS/Physics, Amazon Photo/PC, and ogbn-arxiv. A detailed summary of all the datasets used is provided in Appendix J. We use datasets directly from the pytorch_geometric package, so no preprocessing is needed.
Dataset Splits	No	The paper mentions "full-batch training" and "batching" for certain large graphs like ogbn-arxiv, but it does not specify explicit train/validation/test split percentages or detailed methodologies for data partitioning across the datasets used in the experiments.
Hardware Specification	Yes	All experiments were run on an NVIDIA A100 GPU and Intel Xeon 2680 CPUs.
Software Dependencies	Yes	All experiments used the same environment running Cent OS 7, Python 3.9.12, Py Torch 2.0, Py Torch Geometric 2.2.0.
Experiment Setup	Yes	Learning Rate: [0.001, 0.1] α: [500, 10000] β: [10, 250] γ: [100, 1000] λ: [0, 100] λrecon: [10, 250] λkl: [0.001, 0.1]. All experiments were run on an NVIDIA A100 GPU and Intel Xeon 2680 CPUs. We are usually running 4-16 experiments together to utilize resources (for example, in 40GB GPU memory, we can run 8 experiments on Pub Med simultaneously).