reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

The EM Algorithm is Adaptively-Optimal for Unbalanced Symmetric Gaussian Mixtures

Authors: Nir Weinberger, Guy Bresler

JMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	For the empirical iteration based on n samples, we show that when initialized at θ0 = 0, the EM algorithm adaptively achieves the minimax error rate O min n 1 (1 2δ ) n 1/4 o in no more than O 1 θ (1 2δ ) iterations (with high probability). We also consider the EM iteration for estimating the weight δ , assuming a ﬁxed mean θ (which is possibly mismatched to θ ). For the empirical iteration of n samples, we show that the minimax error rate O 1 θ d n is achieved in no more than O 1 θ 2 iterations. These results robustify and complement recent results of Wu and Zhou (2019) obtained for the equal weights case δ = 1/2. ... Figure 1 illustrates several EM iterations (based on single runs of n = 104 samples).
Researcher Affiliation	Academia	Nir Weinberger EMAIL The Viterbi Faculty of Electrical and Computer Engineering Technion Israel Institute of Technology Haifa, 3200004, Israel Guy Bresler EMAIL Department of Electrical Engineering and Computer Science Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
Pseudocode	No	The paper describes the EM iteration evolution via mathematical equations (8) and (9) and detailed textual descriptions, but it does not present these steps in a structured pseudocode or algorithm block.
Open Source Code	No	The paper does not contain any explicit statement about releasing source code or a link to a code repository for the methodology described.
Open Datasets	No	The paper focuses on theoretical analysis of Gaussian mixture models and uses 'n samples' (implicitly synthetic data) for illustrative purposes. It does not mention or provide access information for any specific, publicly available dataset.
Dataset Splits	No	The paper discusses theoretical properties of the EM algorithm and illustrates with 'single runs of n = 104 samples'. It does not specify any training, test, or validation dataset splits, as it does not conduct experiments on pre-existing datasets.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, or cloud resources) used for running the mentioned 'single runs of n = 104 samples' or any other computational tasks.
Software Dependencies	No	The paper does not specify any software dependencies, libraries, or their version numbers that would be needed to replicate the theoretical analysis or the illustrative 'single runs of n = 104 samples'.
Experiment Setup	No	The paper primarily presents theoretical analysis and mathematical derivations. While it discusses aspects like initialization (e.g., 'initialized at θ0 = 0') for the EM algorithm itself, these are part of the algorithm's definition and theoretical properties, not specific experimental setup details (like learning rates, batch sizes, or optimizer settings) for an empirical study.