Temporal Heterogeneous Graph Generation with Privacy, Utility, and Efficiency

Authors: Xinyu He, Dongqi Fu, Hanghang Tong, Ross Maciejewski, Jingrui He

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, based on temporal heterogeneous graph datasets with up to 1 million nodes and 20 million edges, the experiments show that THEPUFF generates utilizable temporal heterogeneous graphs with privacy protected, compared with state-of-the-art baselines.
Researcher Affiliation Collaboration Xinyu He , Dongqi Fu , Hanghang Tong, Ross Maciejewski, Jingrui He University of Illinois Urbana-Champaign, Meta AI, Arizona State University EMAIL, {dongqifu}@meta.com, {rmacieje}@asu.edu
Pseudocode Yes The general graph perturbation process is summarized in Alg. 1 in Appendix A.3. ... A.3 PSEUDO CODES ... Algorithm 1 Graph Perturbation based on Differential Privacy ... Algorithm 2 Privacy-Utility Adversarial Training ... Algorithm 3 Pseudo-code of Dutil() ... Algorithm 4 Pseudo-code of Assembler
Open Source Code Yes 1Dataset statistics and more implementation details are summarized in Appendix A.5. Code is at https://github.com/xinyuu-he/THe PUff.
Open Datasets Yes Datasets. To test the performance, we utilize 4 real-world publicly-available temporal heterogeneous graph datasets from academic citation graphs (DBLP), online rating graphs (ML-100k, ML20M), and million-node online shopping graphs (Taobao). ... Movie Lens-100k2, DBLP3, Movie Lens-20M4, and Taobao5 are publicly available. 2https://www.kaggle.com/datasets/prajitdatta/movielens-100k-dataset 3https://www.aminer.org/citation 4https://www.kaggle.com/datasets/grouplens/movielens-20m-dataset 5https://tianchi.aliyun.com/dataset/649
Dataset Splits No During the adversarial training, we extract sampled subgraphs (e.g., via random walks) as model inputs. The paper discusses input sampling and mini-batches but does not explicitly state train/test/validation splits for the datasets used in evaluation.
Hardware Specification Yes Machine Configuration. All experiments are performed on a Linux platform with Intel(R) Xeon(R) Gold 6240R CPU and Tesla V100 SXM2 32GB GPU.
Software Dependencies No SGD optimizer is used for discriminators, while RMSprop optimizer is used for the generator; The paper mentions optimizers (SGD, RMSprop) and model architectures (LSTM, tri-level attention networks) but does not provide specific version numbers for any software dependencies like programming languages or libraries.
Experiment Setup Yes Hyperparameters. Table 2 is implemented with the following hyperparameters: ϵ = 8 for all datasets, ϵ+ is decided by Eq. 4. batch size = 32 for Movie Lens 100K dataset and DBLP dataset, 64 for other datasets; node embedding dimension = 128; hidden dimensions are all set to 128; dropout rate = 0.2 in the attention layer; learning rate = 1e 4 for the generator and 1e 3 for discriminators; SGD optimizer is used for discriminators, while RMSprop optimizer is used for the generator; JUST (Hussein et al., 2018) is applied to initialize node embeddings. In the running of JUST, we have the maximum walk length as 100; sample maximum of 10 walks starting from each node.