Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]

StructComp: Substituting propagation with Structural Compression in Training Graph Contrastive Learning

Authors: Shengzhong Zhang, Wenjie Yang, Xinyuan Cao, Hongwei Zhang, Zengfeng Huang

ICLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical studies on various datasets show that Struct Comp greatly reduces the time and memory consumption while improving model performance compared to the vanilla GCL models and scalable training methods.
Researcher Affiliation Academia Shengzhong Zhang Fudan University, Shanghai, China EMAIL Wenjie Yang Fudan University, Shanghai, China EMAIL Xinyuan Cao Georgia Institute of Technology, Midtown, USA EMAIL Hongwei Zhang Fudan University, Shanghai, China EMAIL Zengfeng Huang Fudan University, Shanghai, China EMAIL
Pseudocode No The paper does not contain any explicit pseudocode or algorithm blocks. It describes methods in narrative text and mathematical formulations.
Open Source Code Yes The remaining hyperparameter settings for each GCL model are list in our code: https://github.com/szzhang17/Struct Comp.
Open Datasets Yes The results are evaluated on night real-world datasets (Kipf & Welling, 2017; Veliห‡ckovi c et al., 2018; Zhu et al., 2021b; Hu et al., 2020), Cora, Citeseer, Pubmed, Amazon Computers, Amazon Photo, Ogbn-Arixv, Ogbn-Products and Ogbn-Papers100M. ... More detailed statistics of the night datasets are summarized in the Appendix C.
Dataset Splits No On small-scale datasets, including Cora, Citeseer, Pubmed, Amazon Photo and Computers, performance is evaluated on random splits. We randomly select 20 labeled nodes per class for training, while the remaining nodes are used for testing. All results on small-scale datasets are averaged over 50 runs, and standard deviations are reported. For Ogbn-Arixv, Ogbn-Products and Ogbn-Papers100M, we use fixed data splits as in previous studies Hu et al. (2020). While training and testing splits are mentioned, an explicit validation split is not described for all datasets.
Hardware Specification Yes Experiments are conducted on a server with an NVIDIA 3090 GPU (24 GB memory) and an Intel(R) Xeon(R) Silver 4210R CPU @ 2.40GHz.
Software Dependencies No All the algorithms and models are implemented in Python and Py Torch Geometric. The paper mentions software but does not specify version numbers for PyTorch Geometric.
Experiment Setup Yes The key hyperparameter of our framework is the number of clusters, which is set to [300, 300, 2000, 1300, 700, 20000, 25000, 5000] on night datasets, respectively. All models are optimized using the Adam optimizer. The hyperparameters for GCL models trained with Struct Comp are basically the same as those used for full graph training of GCL models. We show the main hyperparameters in Table 7 and 8.