Efficient Modality Selection in Multimodal Learning
Authors: Yifei He, Runxiang Cheng, Gargi Balasubramaniam, Yao-Hung Hubert Tsai, Han Zhao
JMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we present our empirical evaluation of modality selection via greedy submodular maximization (Algorithm 1), and modality importance ranking via MCI. ... Lastly, we evaluate our theoretical results on one synthetic and two real-world multimodal data sets in each learning setting (6 data sets in total spanning diverse types of modalities). Our evaluation outcome confirms the effectiveness of our framework and algorithms for modality selection. |
| Researcher Affiliation | Academia | Yifei He EMAIL University of Illinois Urbana-Champaign Runxiang Cheng EMAIL University of Illinois Urbana-Champaign Gargi Balasubramaniam EMAIL University of Illinois Urbana-Champaign Yao-Hung Hubert Tsai EMAIL Carnegie Mellon University Han Zhao EMAIL University of Illinois Urbana-Champaign |
| Pseudocode | Yes | Algorithm 1: Greedy Maximization of a Submodular Function f Data: Full set V = {X1, ..., Xk}, constraint q Z+. Input: f : 2V R, and p Z+, where p q |V | Output: Subset Sp S0 = for i = 0, 1, ..., p 1 do Xi = arg max Xj V Si(f(Si {Xj}) f(Si)) Si+1 = Si {Xi} |
| Open Source Code | No | The paper does not provide an explicit link to a code repository, a statement that code is released as supplementary material, or any other clear means of accessing the source code for the methodology described. |
| Open Datasets | Yes | Patch-MNIST. This is a semi-synthetic data set built upon MNIST (Le Cun and Cortes, 1998). ... PEMS-SF. This is a real-world time-series data set from the UCI machine learning repository (Dua and Graff, 2017). ... CMU-MOSI. This is a popular real-world benchmark in affective computing and multimodal learning (Zadeh et al., 2016). ... Appliances. This is a real-world data set aiming at predicting the energy consumption of a household (Candanedo et al., 2017). ... Communities and Crime. This is a real-world crime prediction data set from the UCI machine learning repository (Redmond and Highley, 2010). |
| Dataset Splits | Yes | Patch-MNIST has ten output classes, 50,000 training images, and 10,000 testing images. ... There are a total of 440 instances (days), with the train-val-test split being 200, 67, and 173 samples. ... Training and testing sample sizes are 1284 and 686 respectively. |
| Hardware Specification | No | The paper mentions general computational resources like "GPUs" but does not provide specific details such as GPU models, CPU models, memory specifications, or details about the computing environment used for experiments. |
| Software Dependencies | No | For Patch MNIST, we use a convolutional neural network with one convolutional layer, one max pooling layer, and two fully-connected layers with Re LU for both estimation and prediction. The network is trained with Adam optimizer on a learning rate of 1e 3. For PEMS-SF, we use a 3-layer neural network with Re LU activation and batch normalization for estimation. This is trained with Adam optimizer on a learning rate of 5e 4. For prediction, we use a recent time-series classification pipeline (Dempster et al., 2020) for time-series data processing, followed by a linear Ridge Classifier (Löning et al., 2019). For CMU-MOSI, we experiment with two prediction model types: a linear classifier with Rocket Transformation for timeseries (same as the one for PEMS-SF); and a plain 3-layer fully-connected neural network with Re LU activation. ... The networks are trained with the Adam optimizer with a learning rate of 1e 3. |
| Experiment Setup | Yes | For Patch MNIST, we use a convolutional neural network with one convolutional layer, one max pooling layer, and two fully-connected layers with Re LU for both estimation and prediction. The network is trained with Adam optimizer on a learning rate of 1e 3. For PEMS-SF, we use a 3-layer neural network with Re LU activation and batch normalization for estimation. This is trained with Adam optimizer on a learning rate of 5e 4. For CMU-MOSI, ... a plain 3-layer fully-connected neural network with Re LU activation. ... For the synthetic data set, the feature extractor is a single-layer fully connected network with 128 hidden units. For both the appliances and the crimes data set, the feature extractor is a 3-layer fully connected network with 128 hidden units. For VAE, we use two 3-layer fully connected networks with 128 hidden units as the encoder and the decoder respectively. The latent dimension is 16. The networks are trained with the Adam optimizer with a learning rate of 1e 3. |