reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Multi-aspect Self-guided Deep Information Bottleneck for Multi-modal Clustering

Authors: Shizhe Hu, Jiahao Fan, Guoliang Zou, Yangdong Ye

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experimental results demonstrate that our method outperforms state-of-the-art multi-modal clustering methods, showcasing its superior performance and broad application prospects.
Researcher Affiliation	Academia	School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou, China EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode	Yes	Algorithm 1: MSDIB Algorithm Input: Multi-modal dataset {Xi}m i=1; number of clusters k. Parameter: Hyperparameters α, β, and learning rate γ. Output: The label predictor C.
Open Source Code	Yes	Code https://github.com/Shizhe Hu
Open Datasets	Yes	Caltech-2V (Fei-Fei, Fergus, and Perona 2004) consists of images from 7 categories, totaling 1440 images. It has the feature of Wavelet moments (Shen and Ip 1999) and CENsus TRansform h ISTogram (CENTRIST) (Wu and Rehg 2010), where each kind of feature is regarded as a modal. ESP-Game (Von Ahn and Dabbish 2004) comprises 11,032 images, consisting of 7 categories. The image features and the corresponding text description are used as two modalities. IAPR (Grubinger et al. 2006) is an image collection with semantic descriptions, consisting of 20,000 images and their corresponding textual descriptions. For this study, a total of 7,855 images with labels no less than 4 were selected and categorized into 6 classes. It utilizes the same two modalities as ESP-Game. MIRFlickr (Huiskes and Lew 2008) comprises a total of 12,154 images across 6 categories after denoising. It utilizes the same two modalities as ESP-Game. NUS-Wide (Chua et al. 2009) contains 20,000 images over 8 classes. It comprises a total of two modalities, including both image and text.
Dataset Splits	No	The paper lists several well-known multi-modal datasets and describes their contents, but it does not specify any training/validation/test splits, percentages, or methodology used for partitioning these datasets in their experiments.
Hardware Specification	Yes	We implemented the framework in Py Torch 1.13.0 on Windows 10 with a 24 GB NVIDIA RTX-3090 GPU and i7-12700F CPU.
Software Dependencies	Yes	We implemented the framework in Py Torch 1.13.0 on Windows 10 with a 24 GB NVIDIA RTX-3090 GPU and i7-12700F CPU.
Experiment Setup	Yes	Training converged within 100 epochs. We ran the model 20 times, selecting the highest accuracy at the lowest loss to prevent local maxima. The batch size was 100, using Adam with a learning rate of 0.0001. Grid search optimized trade-off parameters α and β in (0, 1) with a step size of 0.1.