reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Accelerating Spectral Clustering under Fairness Constraints

Authors: Francesco Tonin, Alex Lambert, Johan Suykens, Volkan Cevher

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Numerical experiments demonstrate the effectiveness of our approach on both synthetic and real-world benchmarks, showing significant speedups in computation time over prior art, especially as the problem size grows. This work thus represents a considerable step forward towards the adoption of fair clustering in real-world applications.
Researcher Affiliation	Academia	1LIONS, EPFL, Switzerland 2ESAT-STADIUS, KU Leuven, Belgium. Correspondence to: Francesco Tonin <EMAIL>.
Pseudocode	Yes	Our algorithm is summarized in Algorithm 1. The penalty α(i+1) is updated according to the standard rule suggested in (Boyd et al., 2011) and detailed in Appendix B.
Open Source Code	No	The paper does not explicitly state that the source code for the described methodology is being released, nor does it provide any links to a code repository.
Open Datasets	Yes	We consider the following synthetic datasets: m SBM, Rand Laplace, and Elliptical. The m-SBM benchmark (Kleindessner et al., 2019) is a modification of the stochastic block model (Holland et al., 1983) to take fairness into account. ... Elliptical has k = 2 clusters and h = 2 groups, as defined in (Feng et al., 2024). We consider the following real-world datasets, summarized in Table 10 in Appendix. Last FMNet (Rozemberczki & Sarkar, 2020) ... Thyroid (Quinlan, 1987) ... Census (Kohavi et al., 1996) ... 4area (Ahmadian et al., 2019).
Dataset Splits	No	The paper describes how synthetic datasets are generated (e.g., 'Rand Laplace dataset is a graph induced by a random n n symmetric adjacency matrix W with each node randomly assigned to one of h = 2 groups') and details parameter settings for the experiments (e.g., 'm-SBM benchmark with k = 50, h = 5 and varying n'). However, it does not explicitly provide training/test/validation dataset splits as typically used in supervised machine learning tasks for reproducibility. The evaluation focuses on the clustering results and runtime.
Hardware Specification	Yes	Experiments are implemented in Python 3.10 on a machine with a 3.6GHz Intel i7-9700K processor and 64GB RAM.
Software Dependencies	No	Experiments are implemented in Python 3.10... Regarding optimization of the dual DC problem, we employ the LBFGS algorithm in the scipy implementation... The compared methods o-FSC (Kleindessner et al., 2019) and s-FSC (Wang et al., 2023) employ the scipy eigendecomposition routine. The paper mentions Python 3.10 as a specific version, but for other software like `scipy`, it only mentions the library name without a version number, which is insufficient for full reproducibility.
Experiment Setup	Yes	Regarding optimization of the dual DC problem, we employ the LBFGS algorithm in the scipy implementation with backtracking line search using the strong Wolfe conditions with initialization from the standard normal distribution; the stopping condition is given by gtol = 10 3, ftol = 10 4. The ADMM algorithm is run with α(0) = 0.005, T = 10. The compared methods o-FSC (Kleindessner et al., 2019) and s-FSC (Wang et al., 2023) employ the scipy eigendecomposition routine. For s-FSC, we use ˆL 1 as shift, as suggested in their paper. The k-means algorithm is run to obtain the clustering indicators for all compared methods. ... The update rule for α in Algorithm 1 is standard in ADMM (Boyd et al., 2011) and is given by τα(i) if R(i) F > µ S(i) F α(i)/τ if S(i) F > µ R(i) F α(i) otherwise , with primal and dual residuals R(i) = MH(i) Y (i) and S(i) = α(i)(Y (i) Y (i+1)). The parameters τ and µ are set to 2 and 10, respectively.