reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

From Search to Sampling: Generative Models for Robust Algorithmic Recourse

Authors: Prateek Garg, Lokesh Nagalapatti, Sunita Sarawagi

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We then evaluate on three large real-life datasets that are popularly used in the recourse literature and compare our results with eight existing methods. We show that our method achieves (1) the best score combining validity, proximity and plausibility, (2) is more robust to changes in the cost magnitude compared to SOTA likelihood-based methods, and (3) more faithfully learns the conditional density of recourse than methods that separately learn unconditional data density.
Researcher Affiliation	Academia	Prateek Garg Lokesh Nagalapatti Sunita Sarawagi Indian Institute of Technology Bombay
Pseudocode	Yes	Algorithm 1 Training Recourse model Rθ Require: Training data D, lrn rate η, batch size b, epochs e, sampling parameter K, cost coefficient λ, pair validity γ, classifier h Ensure: Trained model Rθ with parameters θ
Open Source Code	Yes	Our code is available at: https://github.com/prateekgargx/genre.
Open Datasets	Yes	We experimented with benchmark datasets commonly used to evaluate recourse algorithms: Adult Income (Becker & Kohavi, 1996), FICO HELOC (FICO, 2018), and COMPAS (Angwin et al., 2016).
Dataset Splits	Yes	For each dataset, we train a Random Forest (RF) classifier to mimic the latent decisionmaking model, which assigns the gold labels... Adult Income 13 7 2 8,742 27,877 36624 12208 77.33 COMPAS 7 4 2 3,764 865 4629 1543 69.60 HELOC 21 0 0 3,548 3,855 7403 2468 74.23
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models or memory amounts used for experiments.
Software Dependencies	No	We use a Transformer (Vaswani et al., 2017) from Py Torch (Paszke et al., 2019) with learned position embedding... We ensure that the RF classifier is calibrated by using the Calibrated Classifier CV API from sklearn (Pedregosa et al., 2011). The paper mentions software such as PyTorch and scikit-learn, but does not provide specific version numbers for these software components.
Experiment Setup	Yes	The classifier h(x) is an Artificial Neural Network (ANN) a Re LU-based model with three hidden layers of size 10 each, trained with a learning rate of 0.001 for 100 epochs using a batch size of 64... For training Rθ, we use a Transformer (Vaswani et al., 2017) from Py Torch (Paszke et al., 2019) with learned position embedding, embedding size 32, and 16 layers in each of encoder and decoder, and 8 heads. The number of bins in the last layer is 50. We choose the value of λ = 5.0 when sampling training pairs. During inference (Algorithm 2), we set the temperature for bin selection τ = 10.00 and σ = 0.00, generate 10 samples and choose the sample which gets highest probability from the classifier h(x). In Appendix D.2, we provide results over other values of τ and σ. We describe other relevant hyperparameters in Appendix C.2. All datasets uses learning rate = 1e-4 and batch size 16384 for Adult Income dataset else 2048.