reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

XAudit : A Learning-Theoretic Look at Auditing with Explanations

Authors: Chhavi Yadav, Michal Moshkovitz, Kamalika Chaudhuri

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically evaluate our proposed auditors on standard datasets. Our experiments show that unlike the worst case, typical anchors significantly reduce the number of queries needed to audit linear classifiers for feature sensitivity. Additionally, our proposed Anchor Augmentation technique helps reduce the query complexity over a no-anchor approach. Similarly, our experiments on decision trees demonstrate that the average number of queries to audit feature sensitivity is considerably lower than number of nodes in the tree. In this section we conduct experiments on standard datasets to test some aspects of feature sensitivity auditing.
Researcher Affiliation	Collaboration	Chhavi Yadav EMAIL UC San Diego Michal Moshkovitz EMAIL Bosch Center for AI Kamalika Chaudhuri EMAIL UC San Diego, Meta AI
Pseudocode	Yes	Algorithm 1 Alg LCc : Auditing Linear Classifiers using Counterfactuals; Algorithm 2 Alg LCa : Auditing Linear Classifiers using Anchors; Algorithm 3 Alg DT : Auditing Decision Trees using decision path explanations; Algorithm 4 A General Auditor for finite hypothesis classes; Algorithm 5 check_stopping_condition(); Algorithm 6 picking_next_query(); Algorithm 7 update_search_space(); Algorithm 8 Alg LCc : Auditing Linear Classifiers using Counterfactuals; Algorithm 9 Alg LCa : Auditing Linear Classifiers using Anchors; Algorithm 10 findpath(x, P); Algorithm 11 perturb(x, p)
Open Source Code	No	The paper does not provide an explicit statement about releasing its own source code or a link to a code repository. It only mentions using 'the matlab code provided by Alabdulmohsin et al. (2015)' for some experiments.
Open Datasets	Yes	The datasets used in our experiments are Adult Income Becker and Kohavi (1996), Covertype Blackard (1998) and Credit Default Yeh (2016).
Dataset Splits	Yes	We use an 80-20 split to create the train-test sets.
Hardware Specification	No	All of our experiments on a CPU varying from a minute to 2-3 hours.
Software Dependencies	No	We learn both our linear and tree classifiers using scikit-learn. To run the anchor experiments, we use the matlab code provided by Alabdulmohsin et al. (2015).
Experiment Setup	Yes	We learn decision trees for each of the datasets using scikit-learn which implements CART Breiman (2017) to construct the tree. All of these use the Fo I to make predictions. We vary tree depth by fixing the max-depth hyperparameter. Then we freeze the tree and run Alg DT a 1000 times. We report an average of the total queries required to audit across the 1000 runs. We set the augmentation size (number of anchor points augmented) to a maximum of 30.