reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Approximations to worst-case data dropping: unmasking failure modes

Authors: Jenny Y. Huang, David R. Burt, Yunyi Shen, Tin D. Nguyen, Tamara Broderick

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Across our synthetic and real-world data sets, we find that a simple recursive greedy algorithm is the sole algorithm that does not fail any of our tests and also that it can be orders of magnitude faster to run than some competitors. In the present work, we systematically explore whether approximations can detect if there exists a very small fraction (< 1%) of data that, if dropped, can change conclusions.
Researcher Affiliation	Collaboration	Jenny Y. Huang EMAIL MIT EECS and MIT-IBM Watson AI Lab
Pseudocode	No	The paper describes various approximation algorithms (e.g., AMIP, Additive One-Exact, Greedy One-Exact) in text, but it does not contain any clearly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code	Yes	Code for our work is available at gradient Based Data Dropping Failure Modes, including all scripts for reproducing the results in this paper.
Open Datasets	Yes	The Single-cell Genomics and Ames Housing data sets demonstrate multi-outlier failure modes, while the Bird Morphometrics example demonstrates a one-outlier failure mode. Our first data set is taken from a study on the impact of sensory experiences on gene expression in the mouse visual cortex (Hrvatin et al., 2018). The Ames Housing data set provides a comprehensive collection of residential property data from Ames, Iowa, and is a widely utilized data set for regression modeling exercises (De Cock, 2011). This data set is taken from an ecological study on the morphometric features of the saltmarsh sparrow Ammodramus caudacutus (Zuur et al., 2010).
Dataset Splits	No	We might be concerned if dropping a small fraction α (0, 1) of our data changed our substantive conclusions. The value of α is user-defined. We follow Broderick et al. (2020) and use α = 0.01 (i.e., 1% of the data) as a default. This refers to the fraction of data to be dropped for robustness checks, not conventional train/test/validation splits for model training or evaluation.
Hardware Specification	Yes	All experiments were conducted in Python 3 on a personal computer equipped with an Apple M1 Pro CPU at 3200 MHz and 16 GB of RAM.
Software Dependencies	Yes	All experiments were conducted in Python 3 on a personal computer... Both versions are solved using exact solver methods supported from Gurobi 9.0 onwards
Experiment Setup	Yes	Setup. In realistic data settings, we may have a single data point far from the bulk of the data... To construct the plot in Figure 1 (left), we draw 1,000 red crosses by taking xn N(0, 1) i.i.d. and yn = xn + ϵn with ϵn N(0, 1) i.i.d. Throughout, we use N(µ, σ2) to denote the normal distribution with mean µ and variance σ2. The black dot appears at xn = yn = 106. We fit OLS with an intercept. The OLS-estimated slope on the full data set is nearly 1; after dropping the black point (less than 0.1% of the data set), the estimate is nearly -1, representing a sign change.