Clustering via Self-Supervised Diffusion
Authors: Roy Uziel, Irit Chelly, Oren Freifeld, Ari Pakman
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive evaluations on challenging datasets demonstrate that CLUDI achieves state-of-the-art performance in unsupervised classification, setting new benchmarks in clustering robustness and adaptability to complex data distributions. 6. Experiments Datasets. We evaluate CLUDI on a comprehensive suite of benchmark datasets to rigorously assess its scalability, adaptability, and clustering performance. |
| Researcher Affiliation | Academia | 1Department of Computer Science, Ben-Gurion University of the Negev, Beer Sheva, Israel 2Data Science Research Center, Ben Gurion University of the Negev, Beer Sheva, Israel 3The School of Brain Sciences and Cognition, Ben-Gurion University of the Negev, Beer Sheva, Israel 4Department of Industrial Engineering and Management, Ben-Gurion University of the Negev, Beer Sheva, Israel. |
| Pseudocode | No | The paper describes the methodology using mathematical equations and textual explanations, but does not include a distinct section or figure labeled 'Pseudocode' or 'Algorithm' with structured steps. |
| Open Source Code | No | The paper does not contain an explicit statement about the release of source code, nor does it provide a direct link to a code repository. |
| Open Datasets | Yes | The datasets include subsets of Image Net (Deng et al., 2009), Oxford-IIIT Pets (Parkhi et al., 2012) (with K = 32), Oxford 102 Flower (Nilsback & Zisserman, 2008), Caltech 101 (Fei-Fei et al., 2004), CIFAR-10 (Krizhevsky et al., 2009), and STL-10 (Coates et al., 2011). |
| Dataset Splits | Yes | The paper evaluates CLUDI on standard benchmark datasets such as Image Net, CIFAR-10, and STL-10. Figure 2 shows 'Results from Image Net 100 validation data', and Appendix A states, 'All the curves shown correspond to performance metrics evaluated on the validation set of Image Net 100.', indicating the use of standard validation splits for these well-defined datasets. |
| Hardware Specification | No | The paper does not provide specific details regarding the hardware (e.g., GPU models, CPU types, memory) used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software dependencies, programming languages, or libraries used in the implementation. |
| Experiment Setup | Yes | The paper specifies key hyperparameters in Appendix A: 'The model requires three hyperparameters: the embedding dimension d, the noise rescaling factor F^2 in Equation 6 and the coefficient λ on the loss in Equation 27. A systematic scan on the validation set of Image Net 100 yielded the optimal values d = 64, F^2 = 25 and λ = 50, which we adopted for all the datasets with K <= 100. For datasets with fewer clusters (Image Net 50, Oxford-IIIT Pets, STL 10, CIFAR 10) we used a smaller embedding d = 32.' |