In-distribution adversarial attacks on object recognition models using gradient-free search.
Authors: Spandan Madan, Tomotake Sasaki, Hanspeter Pfister, Tzu-Mao Li, Xavier Boix
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We train models on data sampled from parametric distributions, then search inside this data distribution to find such in-distribution adversarial examples. This is done using our gradient-free evolution strategies (ES) based approach which we call CMA-Search. Despite training with a large-scale (0.5 million images), unbiased dataset of camera and light variations, CMA-Search can find a failure inside the data distribution in over 71% cases by perturbing the camera position. With lighting changes, CMA-Search finds misclassifications in 42% cases. These findings also extend to natural images from Image Net and Co3D datasets. |
| Researcher Affiliation | Collaboration | Spandan Madan EMAIL Harvard University Tomotake Sasaki EMAIL Fujitsu Limited Hanspeter Pfister pfister@seas.harvard.edu Harvard University Tzu-Mao Li EMAIL UCSD Xavier Boix EMAIL Fujitsu Research of America |
| Pseudocode | Yes | Algorithm 1 CMA-Search over camera parameters to find in-distribution adversarial examples. |
| Open Source Code | Yes | All code, datasets, and demos are available at https://github.com/Spandan-Madan/in_distribution_adversarial_examples. |
| Open Datasets | Yes | These findings also extend to natural images from Image Net and Co3D datasets. Common Objects in 3D (Co3D) Reizenstein et al. (2021) dataset. We simply sample camera and lighting parameters from a fixed, uniform distribution, and render a subset of 3D models from Shape Net Chang et al. (2015) objects with the sampled camera and lighting parameters. |
| Dataset Splits | Yes | All models were trained on 0.5 million rendered images across 11 categories, with 1000 images for every 3D model. For Co3D: The training dataset was constructed by sampling uniformly across videos from 5 categories (car, chair, handbag, laptop, and teddy bear). This amounts to 187, 200 training images, or 38, 000 images per category which is 32 times the Image Net training set on a per category basis. An in-distribution test set of 68, 854 images was generated by sampling the remaining frames from these categories. |
| Hardware Specification | Yes | All experiments were conducted on a compute cluster consisting of 8 NVIDIA Tesla K80 GPUs, and all models were trained on a single GPU at a time. |
| Software Dependencies | No | Algorithm 1 provides an outline for the method which was implemented using pycma Hansen & Ostermeier (1996); Hansen et al. (2019). Explanation: While pycma is mentioned as a software component, no specific version number for it or any other software (e.g., Python, TensorFlow, PyTorch) is provided. |
| Experiment Setup | Yes | A 5 layer multi-layer perceptron (MLP) with Re LU activations was used, with the output dimensionality of hidden layers set to 5D, D, D/5, D/5, and 2 respectively. ... All models were trained for 100 epochs with stochastic gradient descent (SGD) with a learning rate of 0.0001. All CNN models were trained with a batch size of 75 images, while transformers were trained with a batch size of 25. Models were trained for 50 epochs with an Adam optimizer with a fixed learning rate of 0.0003. |