XAudit : A Learning-Theoretic Look at Auditing with Explanations

Authors: Chhavi Yadav, Michal Moshkovitz, Kamalika Chaudhuri

TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically evaluate our proposed auditors on standard datasets. Our experiments show that unlike the worst case, typical anchors significantly reduce the number of queries needed to audit linear classifiers for feature sensitivity. Additionally, our proposed Anchor Augmentation technique helps reduce the query complexity over a no-anchor approach. Similarly, our experiments on decision trees demonstrate that the average number of queries to audit feature sensitivity is considerably lower than number of nodes in the tree. In this section we conduct experiments on standard datasets to test some aspects of feature sensitivity auditing.
Researcher Affiliation Collaboration Chhavi Yadav EMAIL UC San Diego Michal Moshkovitz EMAIL Bosch Center for AI Kamalika Chaudhuri EMAIL UC San Diego, Meta AI
Pseudocode Yes Algorithm 1 Alg LCc : Auditing Linear Classifiers using Counterfactuals; Algorithm 2 Alg LCa : Auditing Linear Classifiers using Anchors; Algorithm 3 Alg DT : Auditing Decision Trees using decision path explanations; Algorithm 4 A General Auditor for finite hypothesis classes; Algorithm 5 check_stopping_condition(); Algorithm 6 picking_next_query(); Algorithm 7 update_search_space(); Algorithm 8 Alg LCc : Auditing Linear Classifiers using Counterfactuals; Algorithm 9 Alg LCa : Auditing Linear Classifiers using Anchors; Algorithm 10 findpath(x, P); Algorithm 11 perturb(x, p)
Open Source Code No The paper does not provide an explicit statement about releasing its own source code or a link to a code repository. It only mentions using 'the matlab code provided by Alabdulmohsin et al. (2015)' for some experiments.
Open Datasets Yes The datasets used in our experiments are Adult Income Becker and Kohavi (1996), Covertype Blackard (1998) and Credit Default Yeh (2016).
Dataset Splits Yes We use an 80-20 split to create the train-test sets.
Hardware Specification No All of our experiments on a CPU varying from a minute to 2-3 hours.
Software Dependencies No We learn both our linear and tree classifiers using scikit-learn. To run the anchor experiments, we use the matlab code provided by Alabdulmohsin et al. (2015).
Experiment Setup Yes We learn decision trees for each of the datasets using scikit-learn which implements CART Breiman (2017) to construct the tree. All of these use the Fo I to make predictions. We vary tree depth by fixing the max-depth hyperparameter. Then we freeze the tree and run Alg DT a 1000 times. We report an average of the total queries required to audit across the 1000 runs. We set the augmentation size (number of anchor points augmented) to a maximum of 30.