reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Fairness-Aware Dense Subgraph Discovery

Authors: Emmanouil Kariotakis, Nicholas D Sidiropoulos, Aritra Konar

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on real-world datasets demonstrate that our methods not only match but frequently outperform existing solutions, sometimes incurring even less than half the subgraph density loss compared to prior art, while achieving the target fairness levels. Importantly, they excel in scenarios that previous methods fail to adequately handle, i.e., those with extreme subgroup imbalances, highlighting their effectiveness in extracting fair and dense solutions. ... 7 Experimental Evaluation: We consider various real-world datasets, with Table 1 summarizing their statistics (for more information, see Appendix F). The implementation of our approach is publicly available on Github1.
Researcher Affiliation	Academia	Emmanouil Kariotakis EMAIL ESAT-STADIUS KU Leuven Nicholas D. Sidiropoulos EMAIL Department of Electrical Engineering University of Virginia Aritra Konar EMAIL ESAT-STADIUS KU Leuven
Pseudocode	Yes	Additionally, it is straightforward to implement it using the objective functions of FADSG-I and II (for more details, see [Algorithm 1, Appendix E]). ... Pseudo-code of the algorithm is provided in (Algorithm 1).
Open Source Code	Yes	The implementation of our approach is publicly available on Github1. 1https://github.com/ekariotakis/fadsg_tmlr2025
Open Datasets	Yes	We consider various real-world datasets, with Table 1 summarizing their statistics (for more information, see Appendix F). ... Political Books Dataset (Pol Books) ... Political Blogs Dataset (Pol Blogs) (Adamic & Glance, 2005) ... Amazon Products Metadata (Amazon) (Ni et al., 2019) ... Last FM Asia Social Network (Last FM) (Rozemberczki & Sarkar, 2020) ... Deezer Europe Social Network (Deezer) (Rozemberczki & Sarkar, 2020) ... Git Hub Developers (Git Hub) (Rozemberczki et al., 2019) ... Twitch Users Social Networks (Twitch) (Rozemberczki et al., 2019) ... Twitter Retweet Political Network (Twitter) (Rossi & Ahmed, 2015; Conover et al., 2011)
Dataset Splits	No	The paper lists multiple datasets and their summary statistics in Table 1 and Appendix F, but it does not provide specific details on how these datasets were split into training, validation, or test sets for the experiments.
Hardware Specification	Yes	All experiments were performed on a single machine, with Intel i7-13700K CPU @ 3.4GHz and 128GB of main memory running Python 3.12.0.
Software Dependencies	Yes	All experiments were performed on a single machine, with Intel i7-13700K CPU @ 3.4GHz and 128GB of main memory running Python 3.12.0.
Experiment Setup	Yes	In order to find the appropriate λ value for the desired target fairness level that we had for each experiment, we performed bisection on λ values. The lower bound was always 0 and the upper was λmax. The value of λmax was computed experimentally for each dataset. We specifically chose the value of λmax based on our observation that it led to the densest protected subgraph for FADSG-I and the entire protected subgraph for FADSG-II. ... Furthermore, this bisection was performed with a tolerance on λ, ε = 10-9. ... we determined that a value of T = 5 iterations provided satisfactory results across all our datasets in our implementation of Super-Greedy++.