reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Efficient Multivariate Robust Mean Estimation Under Mean-Shift Contamination

Authors: Ilias Diakonikolas, Giannis Iakovidis, Daniel Kane, Thanasis Pittas

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We study the algorithmic problem of robust mean estimation of an identity covariance Gaussian in the presence of mean-shift contamination. ... Here we give the first computationally efficient algorithm for high-dimensional robust mean estimation with mean-shift contamination that can tolerate a constant fraction of outliers. ... Specifically, we establish the following theorem: Theorem 1.2. (Main Algorithmic Result) ... In this section, we present our algorithm (Algorithm 1) and establish Theorem 1.2.
Researcher Affiliation	Academia	1University of Wisconsin-Madison 2University of California, San Diego. Correspondence to: Giannis Iakovidis <EMAIL>, Thanasis Pittas <EMAIL>.
Pseudocode	Yes	Algorithm 1 Mean Estimation under Definition 1.1 1: Input: Accuracy ϵ > 0. 2: Output: bµ Rd such that bµ µ ϵ. 3: Fix C a sufficiently large constant, n0=Cd, n1=(d/ϵ2+o(1)) log C(d), and n2=2C/ϵ2 log C(d). 4: /Rough Estimation:/
Open Source Code	No	The paper does not provide explicit statements or links indicating that source code for the described methodology is publicly available. The 'Impact Statement' confirms its theoretical nature, making code release less common.
Open Datasets	No	The paper describes a theoretical 'Mean-Shift Contamination Model' (Definition 1.1) for generating synthetic data points from Gaussian distributions, rather than using or providing access to publicly available, pre-existing datasets. There are no mentions of specific named datasets, links, DOIs, or repositories for data.
Dataset Splits	No	The paper is theoretical and focuses on algorithm design and analysis for data generated under a specific model. It does not involve empirical experiments on real or synthetic datasets that would require specifying training, validation, or test splits.
Hardware Specification	No	The paper presents a theoretical algorithm and its analysis (sample complexity, runtime). It does not describe any empirical experiments, and therefore, no hardware specifications for running such experiments are mentioned.
Software Dependencies	No	The paper is theoretical, describing an algorithm (Algorithm 1) and its mathematical analysis. It does not detail an implementation or list specific software packages, libraries, or their version numbers that would be required to reproduce any empirical results.
Experiment Setup	No	The paper is theoretical, presenting an algorithm and its analysis for robust mean estimation. It does not include an experimental section detailing specific hyperparameters, training configurations, or system-level settings, as no empirical evaluation is performed.