Approximations to worst-case data dropping: unmasking failure modes

Authors: Jenny Y. Huang, David R. Burt, Yunyi Shen, Tin D. Nguyen, Tamara Broderick

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Across our synthetic and real-world data sets, we find that a simple recursive greedy algorithm is the sole algorithm that does not fail any of our tests and also that it can be orders of magnitude faster to run than some competitors. In the present work, we systematically explore whether approximations can detect if there exists a very small fraction (< 1%) of data that, if dropped, can change conclusions.
Researcher Affiliation Collaboration Jenny Y. Huang EMAIL MIT EECS and MIT-IBM Watson AI Lab
Pseudocode No The paper describes various approximation algorithms (e.g., AMIP, Additive One-Exact, Greedy One-Exact) in text, but it does not contain any clearly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code Yes Code for our work is available at gradient Based Data Dropping Failure Modes, including all scripts for reproducing the results in this paper.
Open Datasets Yes The Single-cell Genomics and Ames Housing data sets demonstrate multi-outlier failure modes, while the Bird Morphometrics example demonstrates a one-outlier failure mode. Our first data set is taken from a study on the impact of sensory experiences on gene expression in the mouse visual cortex (Hrvatin et al., 2018). The Ames Housing data set provides a comprehensive collection of residential property data from Ames, Iowa, and is a widely utilized data set for regression modeling exercises (De Cock, 2011). This data set is taken from an ecological study on the morphometric features of the saltmarsh sparrow Ammodramus caudacutus (Zuur et al., 2010).
Dataset Splits No We might be concerned if dropping a small fraction α (0, 1) of our data changed our substantive conclusions. The value of α is user-defined. We follow Broderick et al. (2020) and use α = 0.01 (i.e., 1% of the data) as a default. This refers to the fraction of data to be dropped for robustness checks, not conventional train/test/validation splits for model training or evaluation.
Hardware Specification Yes All experiments were conducted in Python 3 on a personal computer equipped with an Apple M1 Pro CPU at 3200 MHz and 16 GB of RAM.
Software Dependencies Yes All experiments were conducted in Python 3 on a personal computer... Both versions are solved using exact solver methods supported from Gurobi 9.0 onwards
Experiment Setup Yes Setup. In realistic data settings, we may have a single data point far from the bulk of the data... To construct the plot in Figure 1 (left), we draw 1,000 red crosses by taking xn N(0, 1) i.i.d. and yn = xn + ϵn with ϵn N(0, 1) i.i.d. Throughout, we use N(µ, σ2) to denote the normal distribution with mean µ and variance σ2. The black dot appears at xn = yn = 106. We fit OLS with an intercept. The OLS-estimated slope on the full data set is nearly 1; after dropping the black point (less than 0.1% of the data set), the estimate is nearly -1, representing a sign change.