Efficient Multivariate Robust Mean Estimation Under Mean-Shift Contamination
Authors: Ilias Diakonikolas, Giannis Iakovidis, Daniel Kane, Thanasis Pittas
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We study the algorithmic problem of robust mean estimation of an identity covariance Gaussian in the presence of mean-shift contamination. ... Here we give the first computationally efficient algorithm for high-dimensional robust mean estimation with mean-shift contamination that can tolerate a constant fraction of outliers. ... Specifically, we establish the following theorem: Theorem 1.2. (Main Algorithmic Result) ... In this section, we present our algorithm (Algorithm 1) and establish Theorem 1.2. |
| Researcher Affiliation | Academia | 1University of Wisconsin-Madison 2University of California, San Diego. Correspondence to: Giannis Iakovidis <EMAIL>, Thanasis Pittas <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Mean Estimation under Definition 1.1 1: Input: Accuracy ϵ > 0. 2: Output: bµ Rd such that bµ µ ϵ. 3: Fix C a sufficiently large constant, n0=Cd, n1=(d/ϵ2+o(1)) log C(d), and n2=2C/ϵ2 log C(d). 4: /*Rough Estimation:*/ |
| Open Source Code | No | The paper does not provide explicit statements or links indicating that source code for the described methodology is publicly available. The 'Impact Statement' confirms its theoretical nature, making code release less common. |
| Open Datasets | No | The paper describes a theoretical 'Mean-Shift Contamination Model' (Definition 1.1) for generating synthetic data points from Gaussian distributions, rather than using or providing access to publicly available, pre-existing datasets. There are no mentions of specific named datasets, links, DOIs, or repositories for data. |
| Dataset Splits | No | The paper is theoretical and focuses on algorithm design and analysis for data generated under a specific model. It does not involve empirical experiments on real or synthetic datasets that would require specifying training, validation, or test splits. |
| Hardware Specification | No | The paper presents a theoretical algorithm and its analysis (sample complexity, runtime). It does not describe any empirical experiments, and therefore, no hardware specifications for running such experiments are mentioned. |
| Software Dependencies | No | The paper is theoretical, describing an algorithm (Algorithm 1) and its mathematical analysis. It does not detail an implementation or list specific software packages, libraries, or their version numbers that would be required to reproduce any empirical results. |
| Experiment Setup | No | The paper is theoretical, presenting an algorithm and its analysis for robust mean estimation. It does not include an experimental section detailing specific hyperparameters, training configurations, or system-level settings, as no empirical evaluation is performed. |