Interactive Label Cleaning with Example-based Explanations
Authors: Stefano Teso, Andrea Bontempelli, Fausto Giunchiglia, Andrea Passerini
NeurIPS 2021 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our extensive empirical evaluation shows that clarifying the reasons behind the model s suspicions by cleaning the counter-examples helps in acquiring substantially better data and models, especially when paired with our FIM approximation. We empirically address the following research questions: Q1: Do counter-examples contribute to cleaning the data? Q2: Which influence-based selection strategy identifies the most mislabeled counter-examples? Q3: What contributes to the effectiveness of the best counter-example selection strategy? |
| Researcher Affiliation | Academia | Stefano Teso University of Trento Trento, Italy EMAIL Andrea Bontempelli University of Trento Trento, Italy EMAIL Fausto Giunchiglia University of Trento Trento, Italy EMAIL Andrea Passerini University of Trento Trento, Italy EMAIL |
| Pseudocode | Yes | The pseudo-code of CINCER is listed in Algorithm 1. |
| Open Source Code | Yes | The code for all experiments is available at: https://github.com/abonte/cincer. |
| Open Datasets | Yes | Data sets. We used a diverse set of classification data sets: Adult [27]: data set of 48,800 persons... Breast [27]: data set of 569 patients... 20NG [27]: data set of newsgroup posts... MNIST [29]: handwritten digit recognition data set... Fashion [30]: fashion article classification dataset... |
| Dataset Splits | Yes | For adult and breast, a random 80 : 20 training-test split is used while for MNIST, fashion and 20 NG the split provided with the data set is used. |
| Hardware Specification | Yes | All experiments were run on a 12-core machine with 16 Gi B of RAM and no GPU. |
| Software Dependencies | No | We implemented CINCER using Python and Tensorflow [25] on top of three classifiers and compared different counter-example selection strategies on five data sets. |
| Experiment Setup | Yes | Upon receiving a new example, the classifier is retrained from scratch for 100 epochs using Adam [31] with default parameters, with early stopping when the accuracy on the training set reaches 90% for FC and CNN, and 70% for LR. The margin threshold is set to τ = 0.2. |