Reproducibility study of FairAC

Authors: Gijs de Jong, Macha J. Meijer, Derck W. E. Prinzhorn, Harold Ruiter

TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental This work aims to reproduce the findings of the paper "Fair Attribute Completion on Graph with Missing Attributes" written by Guo, Chu, and Li [1] by investigating the claims made in the paper. This paper suggests that the results of the original paper are reproducible and thus, the claims hold. However, the claim that Fair AC is a generic framework for many downstream tasks is very broad and could therefore only be partially tested. Moreover, we show that Fair AC is generalizable to various datasets and sensitive attributes and show evidence that the improvement in group fairness of the Fair AC framework does not come at the expense of individual fairness. Lastly, the codebase of Fair AC has been refactored and is now easily applicable for various datasets and models. In Table 1, the reproduced results of the original study are shown. Together with this, the performance on individual fairness is shown, which are analyzed in Section 4.2.
Researcher Affiliation Academia Gijs de Jong EMAIL University of Amsterdam; Macha J. Meijer EMAIL University of Amsterdam; Derck W. E. Prinzhorn EMAIL University of Amsterdam; Harold Ruiter EMAIL University of Amsterdam
Pseudocode Yes Algorithm 1 Model training. Input: G = (V, E, X), S Output: auto-encoder f AE, Sensitive classifier Cs, Attribute completion f AC
Open Source Code Yes The source code for this reproduction has been made available (See Appendix D for more details). In addition to creating the library, the implementation has been made faster, which makes the runtime about 33% faster. The source is published at https://github.com/oxkitsune/fact. A complete refactor of the original codebase5 has been completed in order to make the framework significantly easier to apply on various downstream tasks.
Open Datasets Yes NBA dataset. This is an extended version of a Kaggle dataset1 containing performance statistics of approximately 400 NBA basketball players from the 2016-2017 season. It includes personal information such as age, nationality, gender and salary [7], with player interactions on Twitter defining the network relationships. The node label indicates whether a player s salary is above the median and nationality serves as the sensitive attribute, intended to be excluded from the embeddings. Pokec datasets. Derived from Pokec, a Slovakian social network [14], the Pokec-z and Pokec-n datasets are adapted from Dai and Wang [7]. While the original Fair AC paper used region as the sensitive attribute for both Pokec datasets, our study also evaluates age and gender. Age is converted to a binary variable by indicating if a person is younger than 21 years or not, similar to the approach in the credit dataset [15]. Additional datasets, German Credit and Recidivism, were also used to evaluate Fair AC [15].
Dataset Splits Yes For Fair AC training, the first 25% of each dataset was used, a limitation due to memory constraints of the machines used for the original paper. The subsequent GNN training utilized 50% of the data, with the remaining 25% reserved for evaluation. All data splits follow those used in the original paper.
Hardware Specification Yes All experiments were performed on 1 NVIDIA Titan RTX GPU. Training one Fair AC model takes about 30 minutes on this GPU, while training a (Fair)GNN model takes about 10 minutes.
Software Dependencies No The Fair AC implementation is openly accessible, but the baseline code for GCN and Fair GNN is not included in this repository. To address this, we integrated these separate codebases, which are also publicly available, into a unified framework. An Adam optimizer was used with a learning rate of 0.001 and a weight decay of 1 10 5. Fair AC uses topological input embeddings which are created using Deepwalk [20].
Experiment Setup Yes In order to reproduce the experiments of the original author as closely as possible, the hyperparameters used in the original paper were used when available. This means that the feature drop rate (α) is 0.3 unless mentioned otherwise. β is 1.0 for all datasets, except for pokec-n, where β is equal to 0.5. In case of missing hyperparameters, the parameters in the provided shell scripts were adapted. For training, 3000 epochs were used including 200 epochs used for autoencoder pretraining. An Adam optimizer was used with a learning rate of 0.001 and a weight decay of 1 10 5. We assumed that this was done on three different seeds. Since, in the available scripts, only one seed is stated, we ve decided to use the seeds 40, 41 and 42, per advice of the authors. For all datasets, the thresholds were set to 0.65 and 0.69, respectively. All hyperparameters used in this study were adapted from the original study. Per default, all models were trained for 3000 epochs, which includes 200 epochs pretraining of the auto-encoder. An initial learning rate of 0.001 was used. On this learning rate, weight decay of 1 10 5 is applied. While training, a dropout of 0.5 is used. The main auto-encoder has a hidden dimension of 128 and uses one attention head. The default feature drop rate (α) was set to 0.3, unless mentioned otherwise. β was set to 1.0 for all datasets, expect for the pokec-n dataset, where it was set to 0.5. As an accuracy threshold, 65.0 was used and for the auc threshold, 69.0 was used.