Temporal Heterogeneous Graph Generation with Privacy, Utility, and Efficiency
Authors: Xinyu He, Dongqi Fu, Hanghang Tong, Ross Maciejewski, Jingrui He
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, based on temporal heterogeneous graph datasets with up to 1 million nodes and 20 million edges, the experiments show that THEPUFF generates utilizable temporal heterogeneous graphs with privacy protected, compared with state-of-the-art baselines. |
| Researcher Affiliation | Collaboration | Xinyu He , Dongqi Fu , Hanghang Tong, Ross Maciejewski, Jingrui He University of Illinois Urbana-Champaign, Meta AI, Arizona State University EMAIL, {dongqifu}@meta.com, {rmacieje}@asu.edu |
| Pseudocode | Yes | The general graph perturbation process is summarized in Alg. 1 in Appendix A.3. ... A.3 PSEUDO CODES ... Algorithm 1 Graph Perturbation based on Differential Privacy ... Algorithm 2 Privacy-Utility Adversarial Training ... Algorithm 3 Pseudo-code of Dutil() ... Algorithm 4 Pseudo-code of Assembler |
| Open Source Code | Yes | 1Dataset statistics and more implementation details are summarized in Appendix A.5. Code is at https://github.com/xinyuu-he/THe PUff. |
| Open Datasets | Yes | Datasets. To test the performance, we utilize 4 real-world publicly-available temporal heterogeneous graph datasets from academic citation graphs (DBLP), online rating graphs (ML-100k, ML20M), and million-node online shopping graphs (Taobao). ... Movie Lens-100k2, DBLP3, Movie Lens-20M4, and Taobao5 are publicly available. 2https://www.kaggle.com/datasets/prajitdatta/movielens-100k-dataset 3https://www.aminer.org/citation 4https://www.kaggle.com/datasets/grouplens/movielens-20m-dataset 5https://tianchi.aliyun.com/dataset/649 |
| Dataset Splits | No | During the adversarial training, we extract sampled subgraphs (e.g., via random walks) as model inputs. The paper discusses input sampling and mini-batches but does not explicitly state train/test/validation splits for the datasets used in evaluation. |
| Hardware Specification | Yes | Machine Configuration. All experiments are performed on a Linux platform with Intel(R) Xeon(R) Gold 6240R CPU and Tesla V100 SXM2 32GB GPU. |
| Software Dependencies | No | SGD optimizer is used for discriminators, while RMSprop optimizer is used for the generator; The paper mentions optimizers (SGD, RMSprop) and model architectures (LSTM, tri-level attention networks) but does not provide specific version numbers for any software dependencies like programming languages or libraries. |
| Experiment Setup | Yes | Hyperparameters. Table 2 is implemented with the following hyperparameters: ϵ = 8 for all datasets, ϵ+ is decided by Eq. 4. batch size = 32 for Movie Lens 100K dataset and DBLP dataset, 64 for other datasets; node embedding dimension = 128; hidden dimensions are all set to 128; dropout rate = 0.2 in the attention layer; learning rate = 1e 4 for the generator and 1e 3 for discriminators; SGD optimizer is used for discriminators, while RMSprop optimizer is used for the generator; JUST (Hussein et al., 2018) is applied to initialize node embeddings. In the running of JUST, we have the maximum walk length as 100; sample maximum of 10 walks starting from each node. |