Fair Representation in Submodular Subset Selection: A Pareto Optimization Approach
Authors: Adriano Fazzone, Yanhao Wang, Francesco Bonchi
TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In the empirical assessment, we evaluate our proposed framework for the problems of maximum coverage and recommendation using real-world data. The numerical results confirm the effectiveness of our proposal compared to competitive baselines. ... In this section, we present extensive experimental results to evaluate the performance of our proposed algorithm (SMFR-Saturate) on two benchmark problems, namely Maximum Coverage and Recommendation, using several real-world data sets. |
| Researcher Affiliation | Academia | Adriano Fazzone EMAIL CENTAI Institute, Turin, Italy Yanhao Wang EMAIL East China Normal University, Shanghai, China Francesco Bonchi EMAIL CENTAI Institute, Turin, Italy Eurecat, Barcelona, Spain |
| Pseudocode | Yes | Algorithm 1: SMFR-Saturate |
| Open Source Code | Yes | Data and code are available publicly at https://github.com/adrianfaz/ Fair-Representation-in-Submodular-Subset-Selection-A-Pareto-Optimization-Approach. |
| Open Datasets | Yes | The Facebook data set (Mislove et al., 2010) is an undirected graph of 1, 216 nodes and 42, 443 edges representing the friendships between Rice University students on Facebook, and the DBLP data set (Dong et al., 2023) is an undirected graph of 3, 980 nodes and 6, 966 edges denoting the coauthorships between researchers. ... The X-Wines data set consists of 150 000 ratings from 10 561 users on 1 007 wines ... The Movie Lens data set consists of 100 000 ratings from 600 users on 9 000 movies ... The Pokec (Kosicky) data set is an undirected graph with 234, 320 nodes and 2, 417, 175 edges, extracted from the Pokec8 data set. ... The Movie Lens-25M 9 data set is a larger version of the Movie Lens data set presented in Section 5. |
| Dataset Splits | No | The paper discusses how nodes are partitioned into communities or groups for defining representativeness functions and constraints (e.g., 'dividing all nodes into 4 disjoint groups based on a sensitive attribute'). It does not provide explicit details on training, validation, or test dataset splits typically used for model evaluation or reproduction. |
| Hardware Specification | Yes | All experiments were run on a Mac Book Pro laptop with an Apple M1 Max processor and 32GB memory running Mac OS 14. |
| Software Dependencies | No | We implemented them in Python 3, and for the OPT algorithm, we applied the Gurobi3 optimizer to solve the ILP formulations of the Maximum Coverage and Recommendation instances. All algorithms except OPT were accelerated using the lazy-forward strategy (Leskovec et al., 2007). ... 5https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.NMF.html |
| Experiment Setup | Yes | We fix k = 40 for the knapsack constraint and k = 5 (and thus r = 20) for the matroid constraint. We set d = 1, 2, and 4 by considering the representativeness functions on the first group C1, the first two groups C1 and C2, and all four groups from C1 to C4. ... We also set d = 1, 2, and 4 by considering the representativeness functions on C1, C1&C2, and C1 C4. Only solutions with β 0.8 are considered for d = 1, β 0.4 for d = 2, and β 0.2 for d = 4. |