Sketch and shift: a robust decoder for compressive clustering

Authors: Ayoub Belhadji, Rémi Gribonval

TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this work, we undertake a scrutinized examination of CL-OMPR to circumvent its limitations. In particular, we show how this algorithm can fail to recover the clusters even in advantageous scenarios. ... To address these limitations, we propose an alternative decoder offering substantial improvements over CL-OMPR. ... The proposed algorithm can extract clustering information from a sketch of the MNIST (resp. of the CIFAR10) dataset that is 10 times smaller than previously and much easier to tune. ... Section 5: Numerical simulations
Researcher Affiliation Academia Ayoub Belhadji EMAIL Univ Lyon, ENS de Lyon, Inria, CNRS, UCBL, LIP UMR 5668, Lyon, France Rémi Gribonval EMAIL Univ Lyon, ENS de Lyon, Inria, CNRS, UCBL, LIP UMR 5668, Lyon, France
Pseudocode Yes Algorithm 1: CL-OMPR Algorithm 2: Proposed decoder
Open Source Code No The paper mentions: "The dataset and the code to generate a similar one can be downloaded from https://openreview.net/forum?id=6rWuWbVmgz." This statement refers to code for generating a synthetic dataset, not explicitly the source code for the proposed decoder or CL-OMPR methodology. The primary algorithms of the paper are not stated to have their source code released.
Open Datasets Yes The proposed algorithm can extract clustering information from a sketch of the MNIST (resp. of the CIFAR10) dataset... In this section, we investigate whether the observations of Section 5.2 hold for real datasets. For this purpose, we perform experiments on spectral features of the MNIST dataset... The resulting matrix can be downloaded from https://gitlab.com/dzla/Spectral MNIST. Appendix C.2 The case of CIFAR-10... we perform experiments on the training set of the CIFAR-10 dataset
Dataset Splits No The paper describes using the full MNIST dataset (N=70000) and the training set of the CIFAR-10 dataset (N=60000) for clustering experiments. It does not specify any training/validation/test splits, which are typically used for supervised learning tasks, as clustering is an unsupervised task.
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models, processors, or memory used for running the experiments. It mentions previous work was done "on a single laptop" and thanks "the Blaise Pascal Center (CBP) for the computational means", but these are not specific hardware specifications for the reported experiments.
Software Dependencies No The paper mentions the use of the "Python package Pycle2" without specifying a version number. It also references other tools like ResNet18 (model architecture), SGD (optimizer), Vlfeat, and SIDUS in the bibliography, but does not list specific software dependencies with version numbers for its own implementation.
Experiment Setup Yes We consider a dataset X = {x1, . . . , x N} R2, where N = 100000 and the xi are i.i.d. draws from a mixture of isotropic Gaussians Pk i=1 αi N(ci, Σi), where k = 3, α1 = α2 = α3 = 1/3, c1, c2, c3 R2 and Σ1 = = Σk = σ2 X I2 R2 2. ... for sketch sizes m {30, 1000}, averaged over 50 realizations of the sketching operator, as a function of the bandwidth σ. ... For all algorithms we use as a domain Θ the hypercube Θ = [ 1, 1]6. ... For the three compressive algorithms, we take T = 2k. Moreover, for the two variants of Algorithm 2, we take L = 10000 random initializations... Appendix C.2: The network is trained on the training set of CIFAR-10 for 50 epochs with SGD with momentum 0.9, learning rate 0.1, learning rate decay 0.1, batch-size 512 and weight-decay 5e-4.