Clustering via Self-Supervised Diffusion

Authors: Roy Uziel, Irit Chelly, Oren Freifeld, Ari Pakman

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive evaluations on challenging datasets demonstrate that CLUDI achieves state-of-the-art performance in unsupervised classification, setting new benchmarks in clustering robustness and adaptability to complex data distributions. 6. Experiments Datasets. We evaluate CLUDI on a comprehensive suite of benchmark datasets to rigorously assess its scalability, adaptability, and clustering performance.
Researcher Affiliation Academia 1Department of Computer Science, Ben-Gurion University of the Negev, Beer Sheva, Israel 2Data Science Research Center, Ben Gurion University of the Negev, Beer Sheva, Israel 3The School of Brain Sciences and Cognition, Ben-Gurion University of the Negev, Beer Sheva, Israel 4Department of Industrial Engineering and Management, Ben-Gurion University of the Negev, Beer Sheva, Israel.
Pseudocode No The paper describes the methodology using mathematical equations and textual explanations, but does not include a distinct section or figure labeled 'Pseudocode' or 'Algorithm' with structured steps.
Open Source Code No The paper does not contain an explicit statement about the release of source code, nor does it provide a direct link to a code repository.
Open Datasets Yes The datasets include subsets of Image Net (Deng et al., 2009), Oxford-IIIT Pets (Parkhi et al., 2012) (with K = 32), Oxford 102 Flower (Nilsback & Zisserman, 2008), Caltech 101 (Fei-Fei et al., 2004), CIFAR-10 (Krizhevsky et al., 2009), and STL-10 (Coates et al., 2011).
Dataset Splits Yes The paper evaluates CLUDI on standard benchmark datasets such as Image Net, CIFAR-10, and STL-10. Figure 2 shows 'Results from Image Net 100 validation data', and Appendix A states, 'All the curves shown correspond to performance metrics evaluated on the validation set of Image Net 100.', indicating the use of standard validation splits for these well-defined datasets.
Hardware Specification No The paper does not provide specific details regarding the hardware (e.g., GPU models, CPU types, memory) used for running the experiments.
Software Dependencies No The paper does not provide specific version numbers for any software dependencies, programming languages, or libraries used in the implementation.
Experiment Setup Yes The paper specifies key hyperparameters in Appendix A: 'The model requires three hyperparameters: the embedding dimension d, the noise rescaling factor F^2 in Equation 6 and the coefficient λ on the loss in Equation 27. A systematic scan on the validation set of Image Net 100 yielded the optimal values d = 64, F^2 = 25 and λ = 50, which we adopted for all the datasets with K <= 100. For datasets with fewer clusters (Image Net 50, Oxford-IIIT Pets, STL 10, CIFAR 10) we used a smaller embedding d = 32.'