reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Reassessing Fairness: A Reproducibility Study of NIFA’s Impact on GNN Models

Authors: Ruben Figge, Sjoerd Gunneweg, Aaron Kuin, Mees Lindeman

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this study, we reproduce and evaluate NIFA across multiple datasets and GNN architectures. Our findings confirm that NIFA consistently degrades fairness measured via Statistical Parity and Equal Opportunity while maintaining utility on classical GNNs.
Researcher Affiliation	Academia	Ruben Figge EMAIL University of Amsterdam Sjoerd Gunneweg EMAIL University of Amsterdam Aaron Kuin EMAIL University of Amsterdam Mees Lindeman EMAIL University of Amsterdam
Pseudocode	No	The paper describes methodologies and loss functions (Appendix A) but does not present any explicit pseudocode blocks or algorithms in a structured format.
Open Source Code	Yes	The codebase is publicly available at: https://github.com/sjoerdgunneweg/Reassessing NIFA.
Open Datasets	Yes	Original Datasets. The experiments conducted in the original paper utilized three different real-world datasets: Pokec-z, Pokec-n2, and DBLP3. Pokec-z and Pokec-n are subsets of the Pokec network. Accord-ing to Takac & Zabovsky (2012), Pokec is the most popular social network in Slovakia and is also widely used in the Czech Republic. In this dataset, nodes represent users, while edges represent unidirectional following relationships (Luo et al., 2024). The DBLP dataset is a citation dataset containing a digital library of comprehensive coverage of database literature (Elmacioglu & Lee, 2005). 2https://snap.stanford.edu/data/soc-pokec.html 3DBLP. Computer Science Bibliography Synthetic Dataset. To investigate the impact of homophily rates, we construct a synthetic dataset based on the work by Espín-Noboa et al. (2022).
Dataset Splits	Yes	We divide the dataset into train (70%), validation (15%), and test (15%) splits. Each split preserves the original distribution of sensitive attributes and proxy labels to minimize sampling bias.
Hardware Specification	Yes	All our experiments were conducted utilizing NVIDIA A100 GPUs with CUDA 12.1.1.
Software Dependencies	Yes	Specifically, we used Python 3.10.16 and Py Torch 2.4.0. For the Deep Graph Library (DGL), we employed version 2.4.0, which requires Py Torch 2.4.0, a constraint we adhered to for compatibility. Additionally, we incorporated the Py Torch Geometric (Py G) library version 2.6.1, along with auxiliary libraries like torchscatter and torch-sparse, all compatible with CUDA 12.1.1.
Experiment Setup	Yes	Constraints on the number of injected nodes (\|VI\| b) and their connectivity (deg(v)v VI d) ensure that the attack remains unnoticeable and deceptive to the defenders, both b and d are predefined budgets. Uncertainty scores are computed as the variance of predictions across samples... Nodes with the top k% uncertainty within each sensitive group are selected, with k as a tunable hyperparameter. Injected nodes are distributed among sensitive groups and connect to d random target nodes within the same group, where d is a hyperparameter. Overall loss. The overall loss for optimizing the injected nodes features is defined as: L = LCE + α LCF + β (LSP + LEO), where α and β are hyperparameters controlling the weight of each term. For each iteration, a source node vi was selected according to its activity level, and a target node vj was chosen with a probability proportional to the product of its in-degree kin j and the homophily hij between the source and target. The expected number of edges in the graph was scaled by the parameter d, ensuring that the total number of edges approximated d N (N 1). D.2 Experimental settings The number of nodes was set to N = 2000, and the edge density d was maintained at 0.0015, ensuring sparse graphs suitable for modeling real-world networks. The node activity followed a power-law out-degree distribution with parameters plo M = plom = 2.5, reflecting heterogeneous connectivity patterns across groups. The number of feature dimensions needed for the proxy task was set to n = 100. Finally, all NIFA attacks were conducted using a perturbation rate of 1%, as per the original paper.