reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Clustering with Semidefinite Programming and Fixed Point Iteration

Authors: Pedro Felzenszwalb, Caroline Klivans, Alice Paul

JMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments show that using ﬁxed point iteration for rounding the Max k-Cut SDP relaxation leads to signiﬁcantly better results when compared to randomized rounding. In Section 6 we compare our ﬁxed point iteration method to the randomized rounding procedure in several examples, showing that the ﬁxed point approach can produce much better clusterings in practice. Finally, in Section 6 we illustrate experimental results of our new rounding method and compare them to the randomized rounding approach from Frieze and Jerrum (1997). We also compare the result of our method to the k-means algorithm for clustering images in the MNIST handwritten digit dataset.
Researcher Affiliation	Academia	Pedro Felzenszwalb EMAIL School of Engineering and Department of Computer Science Brown University Providence, RI 02912, USA; Caroline Klivans EMAIL Division of Applied Mathematics Brown University Providence, RI 02912, USA; Alice Paul alice EMAIL Department of Biostatistics Brown University Providence, RI 02912, USA
Pseudocode	No	The paper describes algorithms and methods in prose and mathematical formulations but does not present any explicitly labeled pseudocode blocks or algorithms in a structured, code-like format.
Open Source Code	No	The paper mentions "The algorithms were implemented in Python" but does not provide any statement about releasing the code or a link to a repository for the methodology described.
Open Datasets	Yes	Section 6.2 MNIST digits: In this section we evaluate the performance of our clustering method on a subset of the MNIST handwritten digits dataset (Le Cun et al.). The full reference is provided: Yann Le Cun, Corinna Cortes, and Christopher Burges. MNIST database of handwritten digits. http://yann.lecun.com/exdb/mnist/.
Dataset Splits	No	For synthetic data: "We performed several experiments using synthetic data in R2...". For MNIST: "In each trial we selected 20 random examples for each of 5 digits (0, 1, 2, 3 and 4).". The paper describes data generation and sampling but does not provide specific train/test/validation splits with percentages, sample counts, or defined random seeds for reproducibility.
Hardware Specification	Yes	The algorithms were implemented in Python and run on a computer with an Intel i7 CPU @ 2.6 Ghz with 8GB of RAM.
Software Dependencies	No	The algorithms were implemented in Python... We use the cvxpy package for convex optimization together with the SCS (splitting conic solver) package to solve SDPs. We used the scipy library implementation of k-means. While specific software packages are mentioned, no version numbers are provided for Python, cvxpy, SCS, or scipy.
Experiment Setup	Yes	The ﬁxed point iteration method converged after 1 to 5 iterations in each case. The number of iterations until convergence in the ﬁrst setting was between 3 and 10, with an average of 7.02. The number of iterations until convergence in the second setting was between 3 and 4, with an average of 3.01. Each trial involved 10 diﬀerent random initializations of the initial cluster centers using the k-means++ method.