Quantum (Inspired) $D^2$-sampling with Applications

Authors: Poojan Shah, Ragesh Jaiswal

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental investigations show promising results for QI-kmeans++ on large datasets with bounded aspect ratio. Finally, we use our quantum D2-sampling with the known D2-sampling-based classical approximation scheme to obtain the first quantum approximation scheme for the k-means problem with polylogarithmic running time dependence on N.
Researcher Affiliation Academia Poojan Chetan Shah and Ragesh Jaiswal Department of Computer Science and Engineering Indian Institute of Technology Delhi EMAIL
Pseudocode Yes Algorithm 1 QI-k-means++(V, k) 1: Setup sample-query access for dataset V 2: C random data point in V 3: for i = 2 to k do 4: Use sample-query access for w (which uses sample-query access for u1, ..., ui 1, which in turn uses sample-query access for V and c1, ..., ci 1) to D2-sample a center c. The vectors w and u1, ..., uk are as defined in (3) and (4). 5: C C {c}; 6: end for 7: return C
Open Source Code No We implemented our code in C++ for both k-means++ and QI-k-means++ and the results are averaged over 5 runs.
Open Datasets Yes We study the runtime (including setup time for the sample and query data structure) of QI-k-means++ on two extreme types of datasets to demonstrate its possible use cases. The first is binarized MNIST Salakhutdinov & Murray (2008) (70,000 data points, each being a 28 28 pixel image . Each pixel has a value of either 0 or 1, as opposed to the original MNIST, which had values from 0 to 255) and the second is IRIS Fisher (1988) (150 data points with four feature values).
Dataset Splits No No preprocessing/dimensionality reductions were done on any of the datasets used.
Hardware Specification Yes The experiments were performed on an AMD Ryzen 9 5900HX 4.68 GHz 8 Core processor (parallelization on multicore was not used).
Software Dependencies No We implemented our code in C++ for both k-means++ and QI-k-means++ and the results are averaged over 5 runs.
Experiment Setup No We implement the QI-k-means++ algorithm to compare the k-means cost and time with the classical implementation of k-means++ that has a running time O(Nkd).