reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Fair Representation in Submodular Subset Selection: A Pareto Optimization Approach

Authors: Adriano Fazzone, Yanhao Wang, Francesco Bonchi

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In the empirical assessment, we evaluate our proposed framework for the problems of maximum coverage and recommendation using real-world data. The numerical results confirm the effectiveness of our proposal compared to competitive baselines. ... In this section, we present extensive experimental results to evaluate the performance of our proposed algorithm (SMFR-Saturate) on two benchmark problems, namely Maximum Coverage and Recommendation, using several real-world data sets.
Researcher Affiliation	Academia	Adriano Fazzone EMAIL CENTAI Institute, Turin, Italy Yanhao Wang EMAIL East China Normal University, Shanghai, China Francesco Bonchi EMAIL CENTAI Institute, Turin, Italy Eurecat, Barcelona, Spain
Pseudocode	Yes	Algorithm 1: SMFR-Saturate
Open Source Code	Yes	Data and code are available publicly at https://github.com/adrianfaz/ Fair-Representation-in-Submodular-Subset-Selection-A-Pareto-Optimization-Approach.
Open Datasets	Yes	The Facebook data set (Mislove et al., 2010) is an undirected graph of 1, 216 nodes and 42, 443 edges representing the friendships between Rice University students on Facebook, and the DBLP data set (Dong et al., 2023) is an undirected graph of 3, 980 nodes and 6, 966 edges denoting the coauthorships between researchers. ... The X-Wines data set consists of 150 000 ratings from 10 561 users on 1 007 wines ... The Movie Lens data set consists of 100 000 ratings from 600 users on 9 000 movies ... The Pokec (Kosicky) data set is an undirected graph with 234, 320 nodes and 2, 417, 175 edges, extracted from the Pokec8 data set. ... The Movie Lens-25M 9 data set is a larger version of the Movie Lens data set presented in Section 5.
Dataset Splits	No	The paper discusses how nodes are partitioned into communities or groups for defining representativeness functions and constraints (e.g., 'dividing all nodes into 4 disjoint groups based on a sensitive attribute'). It does not provide explicit details on training, validation, or test dataset splits typically used for model evaluation or reproduction.
Hardware Specification	Yes	All experiments were run on a Mac Book Pro laptop with an Apple M1 Max processor and 32GB memory running Mac OS 14.
Software Dependencies	No	We implemented them in Python 3, and for the OPT algorithm, we applied the Gurobi3 optimizer to solve the ILP formulations of the Maximum Coverage and Recommendation instances. All algorithms except OPT were accelerated using the lazy-forward strategy (Leskovec et al., 2007). ... 5https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.NMF.html
Experiment Setup	Yes	We fix k = 40 for the knapsack constraint and k = 5 (and thus r = 20) for the matroid constraint. We set d = 1, 2, and 4 by considering the representativeness functions on the first group C1, the first two groups C1 and C2, and all four groups from C1 to C4. ... We also set d = 1, 2, and 4 by considering the representativeness functions on C1, C1&C2, and C1 C4. Only solutions with β 0.8 are considered for d = 1, β 0.4 for d = 2, and β 0.2 for d = 4.