Human and AI Perceptual Differences in Image Classification Errors
Authors: Minghao Liu, Jiaheng Wei, Yang Liu, James Davis
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | This study first analyzes the statistical distributions of mistakes from the two sources and then explores how task difficulty level affects these distributions. We find that even when AI learns an excellent model from the training data, one that outperforms humans in overall accuracy, these AI models have significant and consistent differences from human perception. We demonstrate the importance of studying these differences with a simple human-AI teaming algorithm that outperforms humans alone, AI alone, or AI-AI teaming. |
| Researcher Affiliation | Academia | Minghao Liu1, Jiaheng Wei2, Yang Liu1, James Davis1 1 Department of Computer Science and Engineering, University of California, Santa Cruz 2 Data Science and Analytics Thrust, Hong Kong University of Science and Technology (Guangzhou) EMAIL, EMAIL, EMAIL, EMAIL |
| Pseudocode | No | The paper describes methods and algorithms but does not present any structured pseudocode or algorithm blocks. The mathematical formulations are presented in standard prose or equations. |
| Open Source Code | No | The paper does not contain an explicit statement about releasing source code, nor does it provide a link to a code repository or supplementary materials for code. |
| Open Datasets | Yes | The top-ranked machine vision models can achieve extremely high accuracy on CIFAR-10 (Krizhevsky, Hinton et al. 2009) image classification by training on clean labels. ... In this paper, we adopt CIFAR-N (Wei et al. 2022c), a label-noise benchmark that provides three noisy human annotations for each image of the CIFAR-10 training dataset. |
| Dataset Splits | Yes | We split the training set into a 40K training subset and a 10K test subset. We explore human perceptual differences using these noisy human annotations. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU or CPU models, processor types, or memory amounts used for running the experiments. It only discusses software, datasets, and general experimental results without specifying the underlying hardware. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software libraries or frameworks used in the experiments. It mentions neural networks and machine learning classifiers in general but no concrete software dependencies with versions. |
| Experiment Setup | No | The paper discusses various machine learning models and their accuracy but does not provide specific hyperparameters like learning rates, batch sizes, number of epochs, or optimizer settings. It describes the general methodology and evaluations but lacks the detailed configuration necessary for reproduction. |