Super Deep Contrastive Information Bottleneck for Multi-modal Clustering
Authors: Zhengzheng Lou, Ke Zhang, Yucong Wu, Shizhe Hu
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conducted experiments on 4 multi-modal datasets and the accuracy of the method on the ESP dataset improved by 9.3%. The results demonstrate the superiority and clever design of the proposed SDCIB. |
| Researcher Affiliation | Academia | 1School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou, China. Correspondence to: Shizhe Hu < EMAIL, https://shizhehu.github.io/>. |
| Pseudocode | Yes | Algorithm 1 Algorithm for Optimizing the proposed SDCIB |
| Open Source Code | Yes | The source code is available on https://github.com/Shizhe Hu. |
| Open Datasets | Yes | Caltech-2V(Fei-Fei et al., 2004) contains 1,440 image samples, categorized into 7 classes based on WM and CENTRIST modalities. Event (Li & Fei-Fei, 2007) encompasses 1,579 sports event image samples, divided into 8 categories based on 3 modalities: Color Attention, SIFT, and TPLBP. IAPR (Grubinger et al., 2006) includes 7,855 image samples, accompanied by natural language descriptions, and is divided into 6 categories using SIFT representation and Bo W model modalities. ESP (Von Ahn & Dabbish, 2005) sourced from a social image collection on an image annotation game website, comprises 11,032 image samples, categorized into 7 classes. |
| Dataset Splits | No | The paper does not explicitly provide training/test/validation dataset splits. It lists the number of samples for each dataset but not how they were partitioned for experiments. |
| Hardware Specification | No | No specific hardware details (GPU, CPU, memory, etc.) are provided in the paper. |
| Software Dependencies | No | No specific software dependencies with version numbers are mentioned in the paper. The paper mentions using the Adam optimizer but does not specify its version or the versions of any other libraries or programming languages. |
| Experiment Setup | Yes | The entire training process of the experiment is completed within 40 epochs, with a batch size of 32. The proposed SDCIB consists of M modality-specific encoders, 4 M mutual information estimators, and M clustering layers. Each modality-specific encoder contains 4 fully connected layers with dimensions of 1024, 1024, 1024, and 128, respectively. Each fully connected layer is followed by a Batch Norm layer for representation normalization and a ReLU layer as the activation function. The clustering layer consists of a fully connected layer and a softmax layer to obtain the final clustering results. Meanwhile, we use the Adam optimizer for parameter optimization, with an initial learning rate set to 0.0001. |