reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Adapting Precomputed Features for Efficient Graph Condensation

Authors: Yuan Li, Jun Hu, Zemin Liu, Bryan Hooi, Jia Chen, Bingsheng He

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments show that our approach achieves comparable or better performance while being 96 to 2,455 faster than SOTA methods, making it more practical for large-scale GNN applications.
Researcher Affiliation	Collaboration	1National University of Singapore 2Zhejiang University 3Grab Taxi Holdings Pte. Ltd..
Pseudocode	No	The paper describes the methodology in prose and mathematical equations across sections 3.1, 3.2, and 3.3, but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code	Yes	Our code and data are available at https://github. com/Xtra-Computing/GCPA.
Open Datasets	Yes	our experiments are conducted on seven benchmark datasets including three smaller networks: Cite Seer, Cora, and Pub Med (Kipf & Welling, 2016), and four larger graphs: Ogbn-arxiv, Ogbnproducts (Hu et al., 2020), Flickr (Zeng et al., 2020), and Reddit (Hamilton et al., 2017).
Dataset Splits	Yes	We use the public data splits for fair comparisons. The dataset statistics and settings are detailed in Table 1. ... Table 1: Summary of dataset statistics. Setting Dataset # Train/Val/Test Nodes ... Cite Seer 120/500/1,000 ... Cora 140/500/1,000 ... Pub Med 60/500/1,000 ... Ogbn-arxiv 90,941/29,799/48,603 ... Ogbn-products 196,615/39,323/2,213,091 ... Flickr 44,625/22,312/22,313 ... Reddit 153,431/23,831/55,703.
Hardware Specification	Yes	The experiments are conducted on a single NVIDIA H100 GPU (80GB).
Software Dependencies	No	The paper mentions the use of GCN and SGC as backbone models, and the AdamW optimizer with its settings (learning rate η = 0.001, β1 = 0.9, β2 = 0.999, and λ = 0.01), but it does not specify version numbers for any software libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages used for implementation.
Experiment Setup	Yes	For our method, we tune the structure-based precomputation hops K {1, 2, 3, 4}, damping factor α {0, 0.25, 0.5, 0.75}, residual coefficient β {0, 0.25, 0.5, 0.75}, diversity coefficient γ {0, 0.001, 0.01, 0.1, 1}, semantic-based aggregation size M {1, 10, 50, 100}, number of negative samples S {1, 5, 10, 50}, number of adaptation layers {1, 2, 3}, and hidden dimension of the adaptation module {128, 256, 512}. We tune all hyperparameters on the validation set. We adopt the default settings of Adam W, including learning rate η = 0.001, β1 = 0.9, β2 = 0.999, and λ = 0.01 for weight decay.