Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]

MultiZoo and MultiBench: A Standardized Toolkit for Multimodal Deep Learning

Authors: Paul Pu Liang, Yiwei Lyu, Xiang Fan, Arav Agarwal, Yun Cheng, Louis-Philippe Morency, Ruslan Salakhutdinov

JMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In order to accelerate progress towards understudied modalities and tasks while ensuring real-world robustness, we release MULTIZOO, a public toolkit consisting of standardized implementations of > 20 core multimodal algorithms and MULTIBENCH, a large-scale benchmark spanning 15 datasets, 10 modalities, 20 prediction tasks, and 6 research areas. Together, these provide an automated end-to-end machine learning pipeline that simplifies and standardizes data loading, experimental setup, and model evaluation. To enable holistic evaluation, we offer a comprehensive methodology to assess (1) generalization, (2) time and space complexity, and (3) modality robustness.
Researcher Affiliation Academia Paul Pu Liang, Yiwei Lyu, Xiang Fan, EMAIL Arav Agarwal, Yun Cheng, EMAIL Louis-Philippe Morency, Ruslan Salakhutdinov EMAIL Machine Learning Department and Language Technologies Institute, Carnegie Mellon University
Pseudocode Yes Algorithm 1 Py Torch code integrating MULTIBENCH datasets and MULTIZOO models. from datasets.get_data import get_dataloader from unimodals.common_models import Res Net, Transformer from fusions.common_fusions import Mult Interactions from training_structures.gradient_blend import train, test
Open Source Code Yes Our toolkits are publicly available, will be regularly updated, and welcome inputs from the community1. Code: https://github.com/pliang279/MultiBench
Open Datasets Yes MULTIBENCH contains a diverse set of 15 datasets spanning 10 modalities and testing for 20 prediction tasks across 6 distinct research areas, and is designed to comprehensively evaluate generalization across domains and modalities, complexity during training and inference, and robustness to noisy and missing modalities. Table 1: MULTIBENCH provides a comprehensive suite of 15 datasets covering a diverse range of 6 research areas, dataset sizes, 10 input modalities (in the form of ℓ: language, i: image, v: video, a: audio, t: time-series, ta: tabular, f: force sensor, p: proprioception sensor, s: set, o: optical flow), and 20 prediction tasks. Examples: MUSTARD (Castro et al., 2019), CMU-MOSI (Zadeh et al., 2016), MIMIC (Johnson et al., 2016).
Dataset Splits No The paper mentions 'traindata, validdata, testdata = get_dataloader( multimodal_imdb )' in Algorithm 1 and discusses a standardized pipeline for data loading, but does not provide specific percentages, sample counts, or explicit references to predefined splits for any of the 15 datasets within the main text.
Hardware Specification No The paper states: 'We record the amount of information taken in bits (i.e., data size), the number of model parameters, as well as time and memory resources required during the entire training process. Real-world models may also need to be small and compact to run on mobile devices (Radu et al., 2016) so we also report inference time and memory on CPU and GPU.' and 'NVIDIA’s GPU support' in acknowledgements. However, it does not specify the exact models or specifications of the CPUs or GPUs used for their experiments.
Software Dependencies No The paper includes 'Algorithm 1 Py Torch code integrating MULTIBENCH datasets and MULTIZOO models.' and shows imports like 'from training_structures.gradient_blend import train, test' and 'optimtype=torch.optim.SGD'. While it implies the use of PyTorch, it does not provide specific version numbers for PyTorch or any other software dependencies.
Experiment Setup Yes Algorithm 1 Py Torch code integrating MULTIBENCH datasets and MULTIZOO models. model = train(encoders, fusion, classifier, traindata, validdata, epochs=100, optimtype=torch.optim.SGD, lr=0.01, weight_decay=0.0001)