reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

A Unifying Post-Processing Framework for Multi-Objective Learn-to-Defer Problems

Authors: Mohammad-Amin Charusaie, Samira Samadi

NeurIPS 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Furthermore, we design a generalizable algorithm to estimate that solution and apply this algorithm to the COMPAS, Hatespeech, and ACSIncome datasets. Our algorithm shows improvements in terms of constraint violation over a set of learn-to-defer baselines and can control multiple constraint violations at once.
Researcher Affiliation	Academia	Mohammad-Amin Charusaie Max Planck Institute for Intelligent Systems Tuebingen, Germany EMAIL Samira Samadi Max Planck Institute for Intelligent Systems Tuebingen, Germany EMAIL
Pseudocode	Yes	Based on this optimal solution, we can design a plug-in method (see Algorithm 1 in Appendix F) to solve the constrained learning problem using empirical data.
Open Source Code	Yes	The code is available in https://github.com/Amin Chrs/Post Process/.
Open Datasets	Yes	Our algorithm shows improvements in terms of constraint violation over a set of learn-to-defer baselines and can control multiple constraint violations at once. The use of d-GNP is beyond learn-to-defer applications and can potentially obtain a solution to decision-making problems with a set of controlled expected performance measures.
Dataset Splits	Yes	n is the size of the set using which we fine-tune the algorithm, ϵ measures the accuracy of learned post-processing scores, and γ is a parameter that measures the sensitivity of the constraint to the change of the predictor.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies	No	The paper mentions training on a '1-layer feed-forward neural network' and using a 'pre-trained model [5]' but does not specify any software libraries or frameworks with version numbers (e.g., PyTorch, TensorFlow, Python version).
Experiment Setup	Yes	All scores, classifiers, and rejection functions are trained on a 1-layer feed-forward neural network. The human assessment is done in this dataset on 1000 cases by giving humans a description of the case and asking them whether the defendant would recidivate within two years of their most recent crime.