Quantum (Inspired) $D^2$-sampling with Applications
Authors: Poojan Shah, Ragesh Jaiswal
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental investigations show promising results for QI-kmeans++ on large datasets with bounded aspect ratio. Finally, we use our quantum D2-sampling with the known D2-sampling-based classical approximation scheme to obtain the first quantum approximation scheme for the k-means problem with polylogarithmic running time dependence on N. |
| Researcher Affiliation | Academia | Poojan Chetan Shah and Ragesh Jaiswal Department of Computer Science and Engineering Indian Institute of Technology Delhi EMAIL |
| Pseudocode | Yes | Algorithm 1 QI-k-means++(V, k) 1: Setup sample-query access for dataset V 2: C random data point in V 3: for i = 2 to k do 4: Use sample-query access for w (which uses sample-query access for u1, ..., ui 1, which in turn uses sample-query access for V and c1, ..., ci 1) to D2-sample a center c. The vectors w and u1, ..., uk are as defined in (3) and (4). 5: C C {c}; 6: end for 7: return C |
| Open Source Code | No | We implemented our code in C++ for both k-means++ and QI-k-means++ and the results are averaged over 5 runs. |
| Open Datasets | Yes | We study the runtime (including setup time for the sample and query data structure) of QI-k-means++ on two extreme types of datasets to demonstrate its possible use cases. The first is binarized MNIST Salakhutdinov & Murray (2008) (70,000 data points, each being a 28 28 pixel image . Each pixel has a value of either 0 or 1, as opposed to the original MNIST, which had values from 0 to 255) and the second is IRIS Fisher (1988) (150 data points with four feature values). |
| Dataset Splits | No | No preprocessing/dimensionality reductions were done on any of the datasets used. |
| Hardware Specification | Yes | The experiments were performed on an AMD Ryzen 9 5900HX 4.68 GHz 8 Core processor (parallelization on multicore was not used). |
| Software Dependencies | No | We implemented our code in C++ for both k-means++ and QI-k-means++ and the results are averaged over 5 runs. |
| Experiment Setup | No | We implement the QI-k-means++ algorithm to compare the k-means cost and time with the classical implementation of k-means++ that has a running time O(Nkd). |