Mixture of Experts as Representation Learner for Deep Multi-View Clustering
Authors: Yunhe Zhang, Jinyu Cai, Zhihao Wu, Pengyang Wang, See-Kiong Ng
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Comprehensive experiments on various multi-view benchmark datasets demonstrate the superiority of DMVC-CE compared to state-of-the-art MVC baselines. [...] Experiment Setup Datasets. We select five popular datasets in our experiment, including: (1) ALOI, (2) Caltech101-all, (3) NUS-WIDE, (4) HW, and (5) BDGP. [...] Evaluation Metrics. We utilize three popular evaluation metrics, including the Clustering Accuracy (ACC), Normalized Mutual Information (NMI), and Adjusted Rand Index (ARI), to evaluate the clustering performance. [...] Ablation Study To validate the necessity of each component in DMVC-CE, we conduct an ablation study to evaluate their influences. |
| Researcher Affiliation | Academia | 1Department of Computer and Information Science, SKL-IOTSC, University of Macau, China 2Institute of Data Science, National University of Singapore, Singapore 3College of Computer Science and Technology, Zhejiang University, China |
| Pseudocode | Yes | Algorithm 1: Training procedure of DMVC-CE. Input: Multi-view dataset {X(v)}V v=1, number of views V, number of experts M, number of selected experts K, number of clusters C. Output: The cluster labels. 1: Initialize the network parameters. 2: while not convergence do 3: Extract the fused feature H via Eqs. (2), (3) and top K expert selection; 4: Obtain reconstruction data X(v) for each view via Eq. (5); 5: Calculate the reconstruction loss Lrecon via Eq. (6); 6: Calculate the equilibrium loss ℓe via Eq. (10); 7: Calculate the distinctiveness loss R2 H({Em}M m=1) via Eq. (12); 8: Back propagation and update network parameters, including θ, ϕ, and ψ. 9: end while 10: Perform k-means to obtain the clustering results. 11: return The cluster labels. |
| Open Source Code | No | The paper does not provide any explicit statements about releasing source code, a link to a code repository, or mention of code in supplementary materials. |
| Open Datasets | Yes | Experiment Setup Datasets. We select five popular datasets in our experiment, including: (1) ALOI, (2) Caltech101-all, (3) NUS-WIDE, (4) HW, and (5) BDGP. Table 2 briefly summarizes the pivotal information of these datasets. (Table 2 is titled: Brief illustration of the benchmark datasets.) |
| Dataset Splits | No | The paper describes using several benchmark datasets for multi-view clustering and evaluates performance using metrics like ACC, NMI, and ARI, with results reported over 10 trials. However, it does not specify any training, validation, or test splits for the datasets. For clustering tasks, it is common to use the entire dataset for evaluation after training the representation learner on it, but the explicit splitting information required for reproducibility is absent. |
| Hardware Specification | Yes | All experiments in this paper are run on the NVIDIA Tesla A100 GPU and AMD EPYC 7532 CPU. |
| Software Dependencies | No | The paper does not explicitly mention any specific software dependencies, libraries, or frameworks with their version numbers (e.g., Python, PyTorch, TensorFlow versions) that were used to implement the methodology. |
| Experiment Setup | Yes | Implementation Details. We fixed the total number of experts M = 10 and the selected expert number K = 3 in the Mo E representation learner. Each expert uses a neuron setting of 2000 500 500 dh, where the latent dimension dh is fixed as 10. The network architecture of the decoder follows the same configuration as the expert. Besides, two hyper-parameters λ and γ vary in {0.001, 0.1, ..., 100} to achieve optimal performance, and the batch size and learning rate are set to 100 and 0.005, respectively. |