reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Multimodal Variational Autoencoder: A Barycentric View

Authors: Peijie Qiu, Wenhui Zhu, Sayantan Kumar, Xiwen Chen, Jin Yang, Xiaotong Sun, Abolfazl Razi, Yalin Wang, Aristeidis Sotiras

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical studies on three multimodal benchmarks demonstrated the effectiveness of the proposed method. Experiments on three benchmark datasets demonstrated the effectiveness of the proposed method compared to other state-of-the-art methods.
Researcher Affiliation	Academia	1 Washington University in St. Louis 2 Arizona State University 3 Clemson University 4 University of Arkansas EMAIL EMAIL, EMAIL, EMAIL
Pseudocode	No	The paper contains mathematical formulations, lemmas, and theorems, but no explicitly labeled pseudocode or algorithm blocks are present.
Open Source Code	No	For more implementation details (e.g., hyperparameter configurations), we kindly direct the readers to Appendix B. There is no explicit statement about code release or a link to a repository.
Open Datasets	Yes	We conducted comparative experiments on three multimodal benchmark datasets: i) Poly MNIST with five simplified modalities, ii) the trimodal MNIST-SVHN-TEXT, and iii) the challenging bimodal Celeb A dataset. Poly MNIST was generated by combining each MNIST digit (Le Cun and Cortes 2010) with 28 28 random crops from five distinct background images, as described in (Sutter, Daunhawer, and Vogt 2021). The MNIST-SVHN-TEXT dataset was introduced by (Sutter, Daunhawer, and Vogt 2020), which consists of three modalities: MNIST digit (Le Cun and Cortes 2010), text, and SVHN (Netzer et al. 2011). The bimodal Celeb A includes human face images as well as text describing the face attributes (Liu et al. 2015).
Dataset Splits	No	The paper refers to "test set" in the evaluation metric section and mentions "20 triples were generated per set" for MNIST-SVHN-TEXT, but it does not provide explicit details about the percentages, counts, or methodology for training, validation, and test splits for any of the datasets used.
Hardware Specification	Yes	All experiments were performed on a Nvidia-A100 GPU with 40G memory.
Software Dependencies	No	For a fair comparison, we followed the experimental settings in previous literature (Shi et al. 2019; Sutter, Daunhawer, and Vogt 2021). In particular, we employed the same network architecture as in (Shi et al. 2019; Sutter, Daunhawer, and Vogt 2021). The paper does not provide specific software dependencies with version numbers.
Experiment Setup	No	For more implementation details (e.g., hyperparameter configurations), we kindly direct the readers to Appendix B. The main text defers hyperparameter configurations to an appendix and refers to previous literature for experimental settings and network architecture, but does not explicitly state them within the main body.