LAION-C: An Out-of-Distribution Benchmark for Web-Scale Vision Models
Authors: Fanfei Li, Thomas Klein, Wieland Brendel, Robert Geirhos, Roland S. Zimmermann
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To address this, we introduce LAION-C as a benchmark alternative for Image Net-C. LAION-C consists of six novel distortion types specifically designed to be OOD, even for web-scale datasets such as LAION. In a comprehensive evaluation of stateof-the-art models, we find that the LAION-C dataset poses significant challenges to contemporary models, including MLLMs such as Gemini and GPT-4o. We additionally conducted a psychophysical experiment to evaluate the difficulty of our corruptions for human observers, enabling a comparison of models to lab-quality human robustness data. |
| Researcher Affiliation | Collaboration | 1Max Planck Institute for Intelligent Systems, T ubingen, Germany 2ELLIS Institute T ubingen 3T ubingen AI Center 4Google Deep Mind. Correspondence to: Fanfei Li <EMAIL>. |
| Pseudocode | No | The paper describes the construction of new OOD distortions (e.g., Mosaic, Glitched, Vertical Lines, Geometric Shapes, Stickers, Luminance Checkerboard) with parameters for different intensity levels (Tables 2-7). However, it does not present these procedures as formal pseudocode or algorithm blocks. |
| Open Source Code | Yes | The evaluation code for LAION-C is publicly available at: https://github.com/Fanfei Li/LAION-C. |
| Open Datasets | Yes | The LAION-C dataset is published on Zenodo. A link to the dataset is provided via the Git Hub repository. |
| Dataset Splits | No | Since the dataset is primarily used for benchmarking purposes, splitting specifics are not provided. Essentially, the entire dataset is a validation set. |
| Hardware Specification | Yes | Our experiments were conducted in a darkened cabin, using a 22 VIEWPixx 3D light LCD monitor (VPixx Technologies, Saint-Bruno, Canada) at a refresh rate of 120 Hz (scanning backlight mode on). The screen measures 484 302 mm, at a resolution of 1920 1200 pixels. ... The experiment was implemented using the Psychophysics Toolbox (Kleiner et al., 2007, version 3.0.12) in MATLAB (Release 2016a, The Math Works, Inc., Natick, Massachusetts, United States) using a 12-core desktop computer (AMD HD7970 graphics card Tahiti by AMD, Sunnyvale, California, United States) running Kubuntu 14.04 LTS. |
| Software Dependencies | Yes | The experiment was implemented using the Psychophysics Toolbox (Kleiner et al., 2007, version 3.0.12) in MATLAB (Release 2016a, The Math Works, Inc., Natick, Massachusetts, United States) using a 12-core desktop computer (AMD HD7970 graphics card Tahiti by AMD, Sunnyvale, California, United States) running Kubuntu 14.04 LTS. |
| Experiment Setup | Yes | Participants were given 2.5 s to view each image, followed by a 2 s response window to classify the image by clicking on a set of icons. ... To motivate high performance, a monetary bonus was awarded for surpassing fixed, predetermined performance thresholds for each block. |