reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Recovery of a Mixture of Gaussians by Sum-of-Norms Clustering

Authors: Tao Jiang, Stephen Vavasis, Chen Wen Zhai

JMLR 2020 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we perform experiments in which a solver for sum-of-norms clustering is applied to a set of points drawn from a mixture of Gaussians. Four experiments are performed to address four questions: (1) How flexibly can λ be chosen? (2) How does the recovery depend on d, the space dimension? (3) How does the recovery degrade as σ (the standard deviation of the Gaussians) increases? and (4) Does the result hold for general weights?
Researcher Affiliation	Academia	Tao Jiang EMAIL School of Operations Research and Information Engineering Cornell University Ithaca, NY 14850, USA Stephen Vavasis EMAIL Department of Combinatorics & Optimization University of Waterloo Waterloo, ON N2L 3G1, Canada Chen Wen Zhai EMAIL Operations Research Center Massachusetts Institute of Technology Cambridge, MA 02139, USA
Pseudocode	No	The paper contains extensive mathematical derivations and proofs, particularly in Section 2, but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	No	In all cases, the code used is our own Julia (Bezanson et al., 2017) implementation of the ADMM solver by Chi and Lange (2015).
Open Datasets	No	Let the vertices a1, . . . , an Rd be generated from a mixture of K Gaussian distributions with parameters µ1, . . . , µK, σ2 1, . . . , σ2 K, and w1, . . . , w K.
Dataset Splits	No	Let the vertices a1, . . . , an Rd be generated from a mixture of K Gaussian distributions with parameters µ1, . . . , µK, σ2 1, . . . , σ2 K, and w1, . . . , w K. Let θ > 0 be given, and let Vm = {i : ai µm θσm}, m = 1, . . . , K. The paper generates data rather than using a pre-existing dataset with explicit train/test/validation splits.
Hardware Specification	Yes	With this tolerance, the runs described below took approximately 27 hours total on an Intel Xeon processor single-threaded.
Software Dependencies	No	In all cases, the code used is our own Julia (Bezanson et al., 2017) implementation of the ADMM solver by Chi and Lange (2015). Our convergence tolerance ϵtol was taken to be 10 6 in all cases. This tolerance corresponds to the quantities ϵpri and ϵrel in the supplemental material of Chi and Lange (2015). The paper mentions software and methods but does not provide specific version numbers for software dependencies.
Experiment Setup	Yes	For this experiment we chose n = 1000, d = K = 6, wi = 1/6 and µi = ei (ith column of the identity matrix) for i = 1, . . . , 6, σ = 0.0094, and θ = 2.0. This choice of σ is made so that the upper and lower bounds on λ in Theorem 3 are nearly equal to a single value λ = 7.0 10 4. Then we tested recovery for λ = κλ with κ = 1/4, 1/2, 1, 2, 4, as shown in Table 6. ... Our convergence tolerance ϵtol was taken to be 10 6 in all cases.