reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Learning Multimodal Data Augmentation in Feature Space

Authors: Zichang Liu, Zhiqiang Tang, Xingjian Shi, Aston Zhang, Mu Li, Anshumali Shrivastava, Andrew Gordon Wilson

ICLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	4 EXPERIMENTS We evaluate Le MDA over a diverse set of real-world multimodal datasets. We curate a list of public datasets covering image, text, numerical, and categorical inputs. Table 1 provides a summary of the source, statistic, and modality identity. We introduce baselines in Section 4.1, and describe experimental settings in Section 4.2 We provide the main evaluation result in Section 4.3. Finally, we investigate the effects of the consistency regularizer and the choices of augmentation model architecture in Section 4.4.
Researcher Affiliation	Collaboration	Zichang Liu Department of Computer Science Rice University EMAIL Zhiqiang Tang Amazon Web Services EMAIL Xingjian Shi Amazon Web Services EMAIL Aston Zhang Amazon Web Services EMAIL Mu Li Amazon Web Services EMAIL Anshumali Shrivastava Department of Computer Science Rice University EMAIL Andrew Gordon Wilson New York University Amazon Web Services EMAIL
Pseudocode	Yes	Algorithm 1 Le MDA Training
Open Source Code	Yes	1Code is available at https://github.com/lzcemma/Le MDA/
Open Datasets	Yes	We evaluate Le MDA over a diverse set of real-world multimodal datasets. We curate a list of public datasets covering image, text, numerical, and categorical inputs. Table 1 provides a summary of the source, statistic, and modality identity. Examples include SNLI-VE (Xie et al., 2019a).
Dataset Splits	No	Table 1 provides train and test set sizes (e.g., Hateful Memes: 7134 Train, 1784 Test) but does not explicitly state a validation split or its size.
Hardware Specification	Yes	Experiments were conducted on a server with 8 V100 GPU.
Software Dependencies	No	The paper mentions using 'pyTorch Autograd' but does not provide specific version numbers for PyTorch or any other software dependencies.
Experiment Setup	Yes	For Le MDA, we set the confidence threshold for consistency regularizer α as 0.5... In our main experiment, we use w1 = 0.0001, w2 = 0.1, w3 = 0.1 on all datasets except Melbourne Airbnb and SNLI-VE. On Melbourne Airbnb and SNLI-VE, we use w1 = 0.001, w2 = 0.1, w3 = 0.1.