Reproducibility Study Of Learning Fair Graph Representations Via Automated Data Augmentations

Authors: Thijmen Nijdam, Juell Sprott, Taiki Papandreou-Lazos, Jurgen de Heus

TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this study, we undertake a reproducibility analysis of "Learning Fair Graph Representations Via Automated Data Augmentations" by Ling et al. (2022). We assess the validity of the original claims focused on node classification tasks and explore the performance of the Graphair framework in link prediction tasks. Our investigation reveals that we can partially reproduce one of the original three claims and fully substantiate the other two. Additionally, we broaden the application of Graphair from node classification to link prediction across various datasets. Our findings indicate that, while Graphair demonstrates a comparable fairness-accuracy trade-off to baseline models for mixed dyadic-level fairness, it has a superior trade-off for subgroup dyadic-level fairness. These findings underscore Graphair s potential for wider adoption in graph-based learning.
Researcher Affiliation Academia Thijmen Nijdam EMAIL University of Amsterdam Juell Sprott EMAIL University of Amsterdam Taiki Papandreou-Lazos EMAIL University of Amsterdam Jurgen de Heus EMAIL University of Amsterdam
Pseudocode No The paper describes the Graphair framework and its components using mathematical formulations (e.g., equations for Ladv, Lcon, Lreconst) and high-level descriptions. Figure 5 in Appendix A.1 provides an "Overview of the Graphair framework" which is a diagram, not structured pseudocode or an algorithm block. No section is explicitly labeled "Pseudocode" or "Algorithm".
Open Source Code Yes Our code base can be found on Git Hub at https://github.com/juellsprott/graphair-reproducibility.
Open Datasets Yes To replicate the main claims, we use the same datasets as the original paper by Ling et al. (2022), which include specific dataset splits and sensitive and target attributes. We employ three real-world graph datasets: NBA1, containing player statistics, and two subsets of the Pokec social network from Slovakia, namely Pokecn and Pokec-z (Dai & Wang, 2021). The specifics of these datasets are summarized in Table 1. For the link prediction task, we utilize well-established benchmark datasets in this domain: Cora, Citeseer, and Pubmed (Spinelli et al., 2021; Chen et al., 2022; Current et al., 2022; Li et al., 2021). 1https://www.kaggle.com/datasets/noahgift/social-power-nba
Dataset Splits Yes To replicate the main claims, we use the same datasets as the original paper by Ling et al. (2022), which include specific dataset splits and sensitive and target attributes. We employ three real-world graph datasets: NBA1, containing player statistics, and two subsets of the Pokec social network from Slovakia, namely Pokecn and Pokec-z (Dai & Wang, 2021). For the link prediction task, we utilize well-established benchmark datasets in this domain: Cora, Citeseer, and Pubmed (Spinelli et al., 2021; Chen et al., 2022; Current et al., 2022; Li et al., 2021). These datasets feature scientific publications as nodes with bag-of-words vectors of abstracts as node features. Edges represent citation links between publications. We form dyadic groups that relate sensitive node attributes to link attributes, following the mixed and subgroup dyadic-level fairness principles suggested by Masrour et al. (2020).
Hardware Specification Yes All of our experiments are conducted on a high-performance computing (HPC) cluster, that features NVIDIA A100 GPUs, divided into four partitions with a combined memory of 40 GB.
Software Dependencies No We obtain the model s codebase from the DIG library, more specifically, from the Fair Graph module2. To enhance reproducibility, we employ complete seeding across all operations, which was missing in some operations of the original code. A key difference in the experimental setup between the one reported by the original authors and ours is that we conducted a 10,000-epoch grid search for the Pokec dataset, instead of the 500-epoch grid search initially reported by (Ling et al., 2022). This modification was recommended by the original authors to enhance reproducibility. Maximum GPU memory usage is determined by max_memory_allocated method from the Pytorch library.
Experiment Setup Yes Node Classification To align our experiments closely with the original study, we adopt the hyperparameters specified by the authors. This includes conducting a grid search on the hyperparameters α, γ, and λ with the values {0.1, 1.0, 10.0}, as performed in the original work (Ling et al., 2022). We use the default settings from the original code where specific hyperparameters are not disclosed, a choice validated by the original authors. A complete list of all hyperparameters is provided in Table 7 in subsection A.2. Link Prediction We replicate the grid search from the node classification experiments for link prediction on the Citeseer, Cora, and Pub Med datasets. Initially, we conduct a grid search on the model parameters, including varying the number of epochs, the learning rates for both the Graphair module and the classifier, and the sizes of the hidden layers for both components. We select the most notable model setup based on performance metrics (accuracy and ROC) and fairness values, and then perform a subsequent grid search on the loss hyperparameters α, λ, and γ to fine-tune the model further. Table 7: Overview of hyperparameters for the Graphair model m and the evaluation classifier c on all datasets. Table 8: Overview of all hyperparameters tuned in the grid search.