reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Personalized Algorithmic Recourse with Preference Elicitation

Authors: Giovanni De Toni, Paolo Viappiani, Stefano Teso, Bruno Lepri, Andrea Passerini

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our empirical evaluation on real-world datasets highlights how PEAR produces high-quality personalized recourse in only a handful of iterations.
Researcher Affiliation	Academia	Giovanni De Toni EMAIL Augmented Intelligence Center, Fondazione Bruno Kessler, Italy DISI, University of Trento, Italy Paolo Viappiani EMAIL LAMSADE, CNRS, Université Paris-Dauphine, PSL, France Stefano Teso EMAIL CIMe C & DISI, University of Trento, Italy Bruno Lepri EMAIL Augmented Intelligence Center, Fondazione Bruno Kessler, Italy Andrea Passerini EMAIL DISI, University of Trento, Italy
Pseudocode	Yes	A high-level overview of PEAR is given in Fig. 1 and the pseudo-code is listed in Algorithm 1. Algorithm 1 The PEAR algorithm: h : S {0, 1} is a classifier, s(0) S the initial state, A the available actions, p(w) the prior, T 1 the query budget, k 2 is the size of choice sets. Algorithm 2 Greedy procedure to efficiently compute a choice set O: s(t) S the current state, A the available actions, k 2 is the size of choice sets, D(t) the user choices so far.
Open Source Code	Yes	We implemented PEAR, the competitors and the black box classifiers using Python (>= 3.7) and Py Torch (Paszke et al., 2019). For reproducibility purposes, the code and the pre-trained models are freely available online4. 4https://github.com/unitn-sml/pear-personalized-algorithmic-recourse
Open Datasets	Yes	We evaluated our approach on two real-world datasets taken from the relevant literature: Give Me Some Credit (Kaggle, 2011) and Adult (Dua & Graff, 2017).
Dataset Splits	Yes	We then split the data into training (70%), validation (10%) and test (20%) sets.
Hardware Specification	Yes	All the experiments were run on a virtual machine running Cent OS 7.6.18 with 165 cores, and 25 GiB of RAM.
Software Dependencies	Yes	We implemented PEAR, the competitors and the black box classifiers using Python (>= 3.7) and Py Torch (Paszke et al., 2019). ... We used the original code for both FARE5 and CSCF6, with minimal modifications to make them compatible with our experimental setting. For FACE, we used the implementation available in the CARLA library (Pawelczyk et al., 2021). ... We also manually performed additional standard data engineering tasks, such as removing entries with null values or checking for potential outliers. After the data cleaning and preprocessing steps, we kept the following features for each dataset: ... For Adult, we adopted the same action set used by De Toni et al. (2023), while for Give Me Some Credit we devised the functions ourselves. ... We also manually performed additional standard data engineering tasks, such as removing entries with null values or checking for potential outliers. After the data cleaning and preprocessing steps, we kept the following features for each dataset: ... We one-hot encoded categorical features and we performed min-max normalization for the continuous features using scikit-learn (Pedregosa et al., 2011).
Experiment Setup	Yes	For PEAR, we vary the number of questions T to the user from 0 to 10. For T = 0, we initialize the weights with the expected value of the prior, EP (w)[w], that represents a user-independent population-based prior. Moreover, we employ two user response models, the noiseless model (Eq. (11)), to check the effectiveness of our approach in the best-case scenario where the user can perfectly express their preferences, and the logistic model (Eq. (10)), to challenge our approach in a more realistic scenario. ... In our experiments, we set the number of simulations to 15 and 10 for Adult and Give Me Some Credit, respectively. We also set the maximum intervention length to 6 and 8, for Adult and Give Me Some Credit, respectively. The value cact_cost and cpuct are also hyperparameters. We set them to 1 and 0.5 respectively, for both experiments. ... We optimize Eq. (14) via Adam and we set the learning rate to 0.001 for Adult, and 0.003 for Give Me Some Credit. ... During training, we set the number of simulations to 15 and 10, for Adult and Give Me Some Credit, respectively. The noise fraction is instead set to ϵP = 0.3 for both, with ηp = 0.3. At inference time, we add no noise (ϵP = 0) and the number of simulations is fixed at 5. ... We set the population size, p = 50, and the maximum number of generations, n = 25, for both Adult and Give Me Some Credit, to keep the computation time manageable. ... In both Adult and Give Credit, we pick only 10% of the total instances. We set the number of neighbours, k, to 50 and the distance threshold to ϵ = 1.0 for both datasets.