Counterfactual Fairness on Graphs: Augmentations, Hidden Confounders, and Identifiability
Authors: Hongyi Ling, Zhimeng Jiang, Na Zou, Shuiwang Ji
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results demonstrate the effectiveness of our method in improving the counterfactual fairness of classifiers on various graph tasks. Moreover, theoretical analysis, coupled with empirical results, illustrates the capability of our method to successfully identify hidden confounders. |
| Researcher Affiliation | Academia | Hongyi Ling EMAIL Department of Computer Science & Engineering Texas A&M University Zhimeng Jiang EMAIL Department of Computer Science & Engineering Texas A&M University Na Zou EMAIL Department of Industrial Engineering University of Houston Shuiwang Ji EMAIL Department of Computer Science & Engineering Texas A&M University |
| Pseudocode | No | The paper describes its methodology in Section 3 and provides theoretical proofs in Section 4 and Appendix A, but it does not include any explicitly labeled pseudocode or algorithm blocks with structured steps. |
| Open Source Code | No | The paper does not contain an explicit statement about releasing source code or provide a link to a code repository for the described methodology. |
| Open Datasets | Yes | We further demonstrate the advance of our method using three real-world datasets (Agarwal et al., 2021), including German, Credit, and Bail. |
| Dataset Splits | Yes | Each dataset is randomly partitioned into training, validation, and test sets, at proportions of 80%, 10%, and 10%, respectively. In the synthetic datasets, we can fully manipulate the data generation process and thus easily generate the counterfactual graphs. For each node, we flip its sensitive attribute to get a new sensitive attribute vector S . Counterfactual graphs are then generated as G(S S ) = {A(S S ), X(S S ), S }, where A(S S ) = FA(Z, S ) and X(S S ) = FX(Z, A(S S ), S ). See Appendix D.1 for more details. For all three datasets, we randomly split 80%/10%/10% for training, validation, and test datasets. |
| Hardware Specification | Yes | We use NVIDIA RTX A6000 GPUs for all our experiments. |
| Software Dependencies | No | The paper mentions using the Adam optimizer (Kingma & Ba, 2015), GCN (Kipf & Welling, 2017), and Tetrad (Ramsey et al., 2018), but it does not specify version numbers for these or any other software libraries or frameworks. |
| Experiment Setup | Yes | For the classification model, we use a GCN model. The number of GCN layers is two, and we use a global mean pooling as the readout function. We set the hidden size as 16. The activation function is Re LU. We use the Adam optimizer (Kingma & Ba, 2015) to train the classification model with 1 × 10−4 learning rate and 1 × 10−4 weight decay. In our experiments on synthetic datasets, we set the dimensionality of the hidden confounders and the number of components to match the data generation process’s ground truth. For real-world datasets, we align the dimensionality of the hidden confounders with that of the node features, setting the number of components to eight. |