reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Generalization Performance of Ensemble Clustering: From Theory to Algorithm

Authors: Xu Zhang, Haoye Qiu, Weixuan Liang, Hui Liu, Junhui Hou, Yuheng Jia

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	By extensive experimental validation, we confirm the validity of our theoretical assertions and demonstrate that the proposed algorithm surpasses other state-of-the-art methods significantly in terms of performance.
Researcher Affiliation	Academia	1School of Computer Science and Engineering, Southeast University, Nanjing 210096, China 2College of Computer Science and Technology, National University of Defense Technology, Changsha, China 3School of Computing Information Sciences, Saint Francis University, Hong Kong, China 4Department of Computer Science, City University of Hong Kong, Hong Kong, China 5Key Laboratory of New Generation Artificial Intelligence Technology and Its Interdisciplinary Applications (Southeast University), Ministry of Education, China. Correspondence to: Yuheng Jia <EMAIL>.
Pseudocode	Yes	The pseudo code for this algorithm is provided in Appendix C. (Algorithm 1)
Open Source Code	Yes	The code is available at https://github.com/xuz2019/GPEC.
Open Datasets	Yes	We evaluated our method on 10 datasets with method CEAM (Zhou et al., 2024), CEs2L, CEs2Q (Li et al., 2019), LWEA (Huang et al., 2018), NWCA (Zhang et al., 2024), ECCMS (Jia et al., 2024), MKKM (Bang et al., 2018), SMKKM (Liu, 2023), SEC (Liu et al., 2017). Due to the space limitations, detailed descriptions of the datasets and comparison methods are provided in Appendix E.1 and E.2. E.1. Details of Datasets In the comparative experiments in Section 6.1, we used 10 benchmark datasets including images, DNA, sensor information, etc. We have summarized the feature information of the datasets in Table 3, and the detailed information is as follows: 1. Phishing Websites1: The dataset consists of a collection of legitimate and phishing website instances. ... http://archive.ics.uci.edu/dataset/327/phishing+websites 2. Rice2: A total of 3810 images of rice grains were captured from two species: Cammeo and Osmancik rices. ... http://archive.ics.uci.edu/dataset/545/rice+cammeo+and+osmancik
Dataset Splits	No	The paper does not provide explicit training/test/validation dataset splits. It only states: "For each dataset, we repeat the experiments 20 times and compute the average performance. The true number of clustering class is chosen as k for each dataset."
Hardware Specification	No	The paper does not explicitly describe the hardware used to run its experiments.
Software Dependencies	No	The paper does not provide specific ancillary software details with version numbers.
Experiment Setup	Yes	For each dataset, we repeat the experiments 20 times and compute the average performance. The true number of clustering class is chosen as k for each dataset. E.4. Hyper-parameter Analysis In this paper, we have only one hyper-parameter, α, which serves as the threshold for extracting high-confidence elements. Fig. 4 shows the performance of our model under different α settings. It can be seen that our method is quite robust across most datasets, and the optimal hyper-parameter is generally between 0.1 and 0.3.