Sketch and shift: a robust decoder for compressive clustering
Authors: Ayoub Belhadji, Rémi Gribonval
TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this work, we undertake a scrutinized examination of CL-OMPR to circumvent its limitations. In particular, we show how this algorithm can fail to recover the clusters even in advantageous scenarios. ... To address these limitations, we propose an alternative decoder offering substantial improvements over CL-OMPR. ... The proposed algorithm can extract clustering information from a sketch of the MNIST (resp. of the CIFAR10) dataset that is 10 times smaller than previously and much easier to tune. ... Section 5: Numerical simulations |
| Researcher Affiliation | Academia | Ayoub Belhadji EMAIL Univ Lyon, ENS de Lyon, Inria, CNRS, UCBL, LIP UMR 5668, Lyon, France Rémi Gribonval EMAIL Univ Lyon, ENS de Lyon, Inria, CNRS, UCBL, LIP UMR 5668, Lyon, France |
| Pseudocode | Yes | Algorithm 1: CL-OMPR Algorithm 2: Proposed decoder |
| Open Source Code | No | The paper mentions: "The dataset and the code to generate a similar one can be downloaded from https://openreview.net/forum?id=6rWuWbVmgz." This statement refers to code for generating a synthetic dataset, not explicitly the source code for the proposed decoder or CL-OMPR methodology. The primary algorithms of the paper are not stated to have their source code released. |
| Open Datasets | Yes | The proposed algorithm can extract clustering information from a sketch of the MNIST (resp. of the CIFAR10) dataset... In this section, we investigate whether the observations of Section 5.2 hold for real datasets. For this purpose, we perform experiments on spectral features of the MNIST dataset... The resulting matrix can be downloaded from https://gitlab.com/dzla/Spectral MNIST. Appendix C.2 The case of CIFAR-10... we perform experiments on the training set of the CIFAR-10 dataset |
| Dataset Splits | No | The paper describes using the full MNIST dataset (N=70000) and the training set of the CIFAR-10 dataset (N=60000) for clustering experiments. It does not specify any training/validation/test splits, which are typically used for supervised learning tasks, as clustering is an unsupervised task. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models, processors, or memory used for running the experiments. It mentions previous work was done "on a single laptop" and thanks "the Blaise Pascal Center (CBP) for the computational means", but these are not specific hardware specifications for the reported experiments. |
| Software Dependencies | No | The paper mentions the use of the "Python package Pycle2" without specifying a version number. It also references other tools like ResNet18 (model architecture), SGD (optimizer), Vlfeat, and SIDUS in the bibliography, but does not list specific software dependencies with version numbers for its own implementation. |
| Experiment Setup | Yes | We consider a dataset X = {x1, . . . , x N} R2, where N = 100000 and the xi are i.i.d. draws from a mixture of isotropic Gaussians Pk i=1 αi N(ci, Σi), where k = 3, α1 = α2 = α3 = 1/3, c1, c2, c3 R2 and Σ1 = = Σk = σ2 X I2 R2 2. ... for sketch sizes m {30, 1000}, averaged over 50 realizations of the sketching operator, as a function of the bandwidth σ. ... For all algorithms we use as a domain Θ the hypercube Θ = [ 1, 1]6. ... For the three compressive algorithms, we take T = 2k. Moreover, for the two variants of Algorithm 2, we take L = 10000 random initializations... Appendix C.2: The network is trained on the training set of CIFAR-10 for 50 epochs with SGD with momentum 0.9, learning rate 0.1, learning rate decay 0.1, batch-size 512 and weight-decay 5e-4. |