OneBatchPAM: A Fast and Frugal K-Medoids Algorithm
Authors: Antoine de Mathelin, Nicolas Enrique Cecchi, François Deheeger, Mathilde Mougeot, Nicolas Vayatis
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Multiple experiments conducted on real datasets of various sizes and dimensions show that our algorithm provides similar performances as state-of-the-art methods such as Faster PAM and Bandit PAM++ with a drastically reduced running time. |
| Researcher Affiliation | Collaboration | 1Centre Borelli, Universit e Paris-Saclay, CNRS, ENS Paris-Saclay 2Michelin |
| Pseudocode | No | The paper describes algorithms using equations and textual explanations, but no explicitly labeled 'Pseudocode' or 'Algorithm' block is present. |
| Open Source Code | Yes | Code https://github.com/antoinedemathelin/obpam |
| Open Datasets | Yes | We conduct the experiments on the MNIST and CIFAR10 image datasets (Le Cun, Cortes, and Burges 1994; Krizhevsky, Hinton et al. 2009) and 8 UCI datasets (Dua and Graff 2017) |
| Dataset Splits | No | The paper mentions dividing datasets into 'small scale' and 'large scale' categories for experimental purposes, but does not provide specific train/test/validation splits, percentages, or sample counts for reproducibility. |
| Hardware Specification | Yes | The experiments are run on a 8G RAM computer with 4 cores. |
| Software Dependencies | No | The paper states: 'Our implementation of One Batch PAM is coded in Python with the Cython module.' and refers to 'official implementations of Bandit PAM++' and 'Python library kmedoids4'. However, no specific version numbers for Python, Cython, Bandit PAM++, or kmedoids are provided. |
| Experiment Setup | Yes | Experiments are performed for different values of k in {10, 50, 100}. Each experiment is repeated 5 times to compute the standard deviations. For Bandit PAM++, we consider the three different settings of swap iterations: T {0, 2, 5}. For Faster CLARA we consider two different settings for the number of subsampling repetitions: I {5, 50}. The sample size is set to m = 80 + 4k as suggested in (Schubert and Rousseeuw 2021). Three different chain lengths are considered for kmc2: L = {20, 100, 200} and two different number of local search iterations for LS-kmeans++: Z = {5, 10}. For One Batch PAM, we use a sample size of m = 100 log(kn). |