Semi-Supervised Multimodal Classification Through Learning from Modal and Strategic Complementarities
Authors: Junchi Chen, Richong Zhang, Junfan Chen
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive empirical studies demonstrate the effectiveness of the proposed framework. Experimental analyses demonstrate the effectiveness of our method. |
| Researcher Affiliation | Academia | 1 CCSE, School of Computer Science and Engineering, Beihang University, Beijing, China 2 School of Software, Beihang University, Beijing, China 3 Zhongguancun Laboratory, Beijing, China EMAIL, EMAIL |
| Pseudocode | No | The paper describes methods in narrative text and mathematical equations, but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our codes will be released in https://github.com/cjc20000323/SSMC. |
| Open Datasets | Yes | We select three datasets to construct benchmarks for SSMC: N24News (Wang et al. 2021), UPMC-Food101 (Bossard, Guillaumin, and Van Gool 2014), Crisis MMD (Alam, Ofli, and Imran 2018). |
| Dataset Splits | No | We split the training set into 20, 50, 100 labeled pairs. This specifies the number of labeled samples used for semi-supervised learning but does not provide comprehensive train/test/validation splits for the entire datasets, including the unlabeled data or how the test sets were constructed. |
| Hardware Specification | No | The paper does not explicitly mention any specific hardware (like GPU or CPU models, or memory) used for running the experiments. |
| Software Dependencies | No | We use BERT and Vi T to encode texts and images. MSC is combined with Fix Match and Free Match. We use Rand Augment to gain augmented images and obtain augmented texts with swap and synonym strategy. We employ sentencebert to calculate similarity between augmented and original texts... We select Adam W as optimizer. The paper mentions software components but does not provide specific version numbers for any of them. |
| Experiment Setup | Yes | The batch size of labeled data Bx is set to 4 and batch size of unlabeled data Bu is set to 32. We set 5e 5 as learning rate. We use Rand Augment to gain augmented images and obtain augmented texts with swap and synonym strategy. We employ sentencebert to calculate similarity between augmented and original texts, choosing text with higher similarity as weak-augment and the other one as strong-augment. We select Adam W as optimizer. η is set to 0.95. The train epoch is set to 20. β1, β2, β3 are set to 1 for N24News and UPMC-Food101 and set to 0.6 for Crisis MMD. |