reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Temporal Heterogeneous Graph Generation with Privacy, Utility, and Efficiency

Authors: Xinyu He, Dongqi Fu, Hanghang Tong, Ross Maciejewski, Jingrui He

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, based on temporal heterogeneous graph datasets with up to 1 million nodes and 20 million edges, the experiments show that THEPUFF generates utilizable temporal heterogeneous graphs with privacy protected, compared with state-of-the-art baselines.
Researcher Affiliation	Collaboration	Xinyu He , Dongqi Fu , Hanghang Tong, Ross Maciejewski, Jingrui He University of Illinois Urbana-Champaign, Meta AI, Arizona State University EMAIL, {dongqifu}@meta.com, {rmacieje}@asu.edu
Pseudocode	Yes	The general graph perturbation process is summarized in Alg. 1 in Appendix A.3. ... A.3 PSEUDO CODES ... Algorithm 1 Graph Perturbation based on Differential Privacy ... Algorithm 2 Privacy-Utility Adversarial Training ... Algorithm 3 Pseudo-code of Dutil() ... Algorithm 4 Pseudo-code of Assembler
Open Source Code	Yes	1Dataset statistics and more implementation details are summarized in Appendix A.5. Code is at https://github.com/xinyuu-he/THe PUff.
Open Datasets	Yes	Datasets. To test the performance, we utilize 4 real-world publicly-available temporal heterogeneous graph datasets from academic citation graphs (DBLP), online rating graphs (ML-100k, ML20M), and million-node online shopping graphs (Taobao). ... Movie Lens-100k2, DBLP3, Movie Lens-20M4, and Taobao5 are publicly available. 2https://www.kaggle.com/datasets/prajitdatta/movielens-100k-dataset 3https://www.aminer.org/citation 4https://www.kaggle.com/datasets/grouplens/movielens-20m-dataset 5https://tianchi.aliyun.com/dataset/649
Dataset Splits	No	During the adversarial training, we extract sampled subgraphs (e.g., via random walks) as model inputs. The paper discusses input sampling and mini-batches but does not explicitly state train/test/validation splits for the datasets used in evaluation.
Hardware Specification	Yes	Machine Configuration. All experiments are performed on a Linux platform with Intel(R) Xeon(R) Gold 6240R CPU and Tesla V100 SXM2 32GB GPU.
Software Dependencies	No	SGD optimizer is used for discriminators, while RMSprop optimizer is used for the generator; The paper mentions optimizers (SGD, RMSprop) and model architectures (LSTM, tri-level attention networks) but does not provide specific version numbers for any software dependencies like programming languages or libraries.
Experiment Setup	Yes	Hyperparameters. Table 2 is implemented with the following hyperparameters: ϵ = 8 for all datasets, ϵ+ is decided by Eq. 4. batch size = 32 for Movie Lens 100K dataset and DBLP dataset, 64 for other datasets; node embedding dimension = 128; hidden dimensions are all set to 128; dropout rate = 0.2 in the attention layer; learning rate = 1e 4 for the generator and 1e 3 for discriminators; SGD optimizer is used for discriminators, while RMSprop optimizer is used for the generator; JUST (Hussein et al., 2018) is applied to initialize node embeddings. In the running of JUST, we have the maximum walk length as 100; sample maximum of 10 walks starting from each node.