Learning from Sample Stability for Deep Clustering

Authors: Zhixin Li, Yuheng Jia, Hui Liu, Junhui Hou

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments across benchmark datasets showcase that incorporating sample stability into training can improve the performance of deep clustering. The code is available at https://github.com/LZX-001/LFSS.
Researcher Affiliation Academia 1School of Computer Science and Engineering, Southeast University, Nanjing 210096, China 2Key Laboratory of New Generation Artificial Intelligence Technology and Its Interdisciplinary Applications (Southeast University), Ministry of Education, China 3School of Computing Information Sciences, Saint Francis University, Hong Kong, China 4Department of Computer Science, City University of Hong Kong, Hong Kong, China. Correspondence to: Yuheng Jia <EMAIL>.
Pseudocode Yes Algorithm 1 Proposed LFSS
Open Source Code Yes The code is available at https://github.com/LZX-001/LFSS.
Open Datasets Yes We conduct experiments on multiple commonly used datasets, including CIFAR-10 (Krizhevsky, 2009), CIFAR20 (Krizhevsky, 2009), STL-10 (Coates et al., 2011), Image Net-10 (Chang et al., 2017), Image Net-Dogs (Chang et al., 2017), Tiny-Image Net (Le & Yang, 2015) and Image Net-1K (Deng et al., 2009).
Dataset Splits No The paper lists standard benchmark datasets such as CIFAR-10, CIFAR-20, STL-10, Image Net-10, Image Net-Dogs, Tiny-Image Net, and Image Net-1K, and mentions 'We train models on STL-10 with extra unlabeled data.' However, it does not explicitly describe the specific training/test/validation splits used for each dataset or how they were applied in the experimental setup for the deep learning model training phase.
Hardware Specification Yes All experiments are conducted based on Py Torch and all models are trained on an NVIDIA RTX 4090 GPU.
Software Dependencies No The paper states 'All experiments are conducted based on Py Torch' but does not provide a specific version number for PyTorch or any other software dependencies.
Experiment Setup Yes We adopt Res Net-18 as the backbone unless specifically specified. We train the models for 1,000 epochs with a batch size of 256, unless noted otherwise. We adopt the stochastic gradient descent (SGD) optimizer and the cosine decay learning rate schedule to effectively train our model. Besides, we adopt data augmentation methods following (Chen et al., 2020). We empirically set the trade-off hyper-parameter λ in Eq. (10) to 0.1 for all experiments unless otherwise specified. We set the unstable ratio δ to 0.1 for all experiments to exclude the most unstable samples in cluster-level loss for LFSS, as indicated by the results in Observation 1. We set the warmup epoch number η to 200 for CIFAR-10 and Image Net-10, 500 for CIFAR-20, STL-10 and Image Net Dogs. The noise intensity in Eq. (9) σ is set to 0.01 for STL10 and Image Net-10, 0.001 for CIFAR-10, CIFAR-20 and Image Net-Dogs.