reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Quantum (Inspired) $D^2$-sampling with Applications

Authors: Poojan Shah, Ragesh Jaiswal

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental investigations show promising results for QI-kmeans++ on large datasets with bounded aspect ratio. Finally, we use our quantum D2-sampling with the known D2-sampling-based classical approximation scheme to obtain the first quantum approximation scheme for the k-means problem with polylogarithmic running time dependence on N.
Researcher Affiliation	Academia	Poojan Chetan Shah and Ragesh Jaiswal Department of Computer Science and Engineering Indian Institute of Technology Delhi EMAIL
Pseudocode	Yes	Algorithm 1 QI-k-means++(V, k) 1: Setup sample-query access for dataset V 2: C random data point in V 3: for i = 2 to k do 4: Use sample-query access for w (which uses sample-query access for u1, ..., ui 1, which in turn uses sample-query access for V and c1, ..., ci 1) to D2-sample a center c. The vectors w and u1, ..., uk are as defined in (3) and (4). 5: C C {c}; 6: end for 7: return C
Open Source Code	No	We implemented our code in C++ for both k-means++ and QI-k-means++ and the results are averaged over 5 runs.
Open Datasets	Yes	We study the runtime (including setup time for the sample and query data structure) of QI-k-means++ on two extreme types of datasets to demonstrate its possible use cases. The first is binarized MNIST Salakhutdinov & Murray (2008) (70,000 data points, each being a 28 28 pixel image . Each pixel has a value of either 0 or 1, as opposed to the original MNIST, which had values from 0 to 255) and the second is IRIS Fisher (1988) (150 data points with four feature values).
Dataset Splits	No	No preprocessing/dimensionality reductions were done on any of the datasets used.
Hardware Specification	Yes	The experiments were performed on an AMD Ryzen 9 5900HX 4.68 GHz 8 Core processor (parallelization on multicore was not used).
Software Dependencies	No	We implemented our code in C++ for both k-means++ and QI-k-means++ and the results are averaged over 5 runs.
Experiment Setup	No	We implement the QI-k-means++ algorithm to compare the k-means cost and time with the classical implementation of k-means++ that has a running time O(Nkd).