New Algorithms for the Learning-Augmented k-means Problem
Authors: Junyu Huang, Qilong Feng, Ziyun Huang, Zhen Zhang, Jinhui Xu, Jianxin Wang
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical experiments show that our proposed methods are faster than the state-of-the-art learning-augmented k-means algorithms with comparable performances on clustering quality. In this section, we give empirical evaluations on the performances of our proposed algorithms. |
| Researcher Affiliation | Academia | Junyu Huang School of Computer Science and Engineering, Central South University Xiangjiang Laboratory Changsha, China EMAIL; Qilong Feng School of Computer Science and Engineering, Central South University Xiangjiang Laboratory Changsha, China EMAIL; Ziyun Huang Department of Computer Science and Software Engineering, Penn State Erie, The Behrend College Erie, United States EMAIL; Zhen Zhang School of Advanced Interdisciplinary Studies, Hunan University of Technology and Business Xiangjiang Laboratory Changsha, China EMAIL; Jinhui Xu School of Information Science and Technology, University of Science and Technology of China Hefei, Anhui, China EMAIL; Jianxin Wang School of Computer Science and Engineering, Central South University Hunan Provincial Key Lab on Bioinformatics Xiangjiang Laboratory Changsha, China EMAIL |
| Pseudocode | Yes | Algorithm 1 Fast-Sampling; Algorithm 2 Fast-Estimation; Algorithm 3 Fast-Filtering; Algorithm 4 Fast-Filtering (modified); Algorithm 5 Fast-Sampling (k-median) |
| Open Source Code | No | The text does not contain any explicit statements about releasing code, a link to a code repository, or mention of code being available in supplementary materials for the methodology described in the paper. |
| Open Datasets | Yes | Following the work in Nguyen et al. (2022) and Ergun et al. (2021), we test the algorithms on datasets CIFAR10 (m = 10, 000, d = 3, 072), PHY (m = 10, 000, d = 50) and MNIST (m = 1, 797, d = 64) with varying error rate α and the number of clusters k. We also test the performances of the algorithms on other large datasets from UCI Machine Learning Repository 3 including SUSY (m = 5, 000, 000, d = 18) and HIGGS (m = 11, 000, 000, d = 27), and one large-scale dataset SIFT (m = 100, 000, 000, d = 128) from Matsui et al. (2017). 3https://archive.ics.uci.edu/ |
| Dataset Splits | No | The paper describes how corrupted labeling partitions are generated for the predictors but does not provide specific details on training, validation, or test dataset splits for the main data points, such as percentages or sample counts. |
| Hardware Specification | Yes | All algorithms are implemented and executed in Python. The experiments were done on a machine with i7-12700KF processor and 256GB RAM. |
| Software Dependencies | No | The paper only mentions 'Python' without specifying its version or the versions of any other key software libraries or dependencies used for the implementation. |
| Experiment Setup | Yes | For the Fast-Sampling algorithm, the sample size is set to 4, and we fix ϵ = 1. For the Fast-Filtering and Fast-Estimation algorithms, we fix R1 = 10, R2 = m/20 and ϵ = 0.3, where m is the size of the given clustering instance. For each algorithm (including those in Ergun et al. (2021); Nguyen et al. (2022) as well as ours), we iterate over 15 possible values of α uniformly distributed in the interval [0.01, 0.5] as the inputs for the algorithms. |