reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Aggregation of Dependent Expert Distributions in Multimodal Variational Autoencoders

Authors: Rogelio A. Mancisidor, Robert Jenssen, Shujian Yu, Michael Kampffmeyer

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The empirical analysis in this research shows that, when dependence between experts is considered, Co DE-VAE exhibits better performance in terms of balancing the trade-off between generative coherence and generative quality, as well as generating more precise log-likelihood estimations. Furthermore, Co DE-VAE minimizes the generative quality gap as the number of modalities increases, achieving unconditional FID scores similar to unimodal VAEs, which is a desirable property that is lacking in most current models. Finally, Co DE-VAE achieves a classiﬁcation accuracy that is comparable to that of current state-of-the-art multimodal VAEs.
Researcher Affiliation	Academia	1Department of Data Science, BI Norwegian Business School, Oslo, Norway 2Department of Physics and Technology, UiT The Arctic University of Norway, Tromsø, Norway 3Department of Computer Science, University of Copenhagen, Copenhagen, Denmark 4Dept. BAMJO, Norwegian Computing Center, Oslo, Norway 5Department of Computer Science, Vrije Universiteit Amsterdam, Amsterdam, Netherlands.
Pseudocode	Yes	Algorithm 1 Minibatch version of the Co DE-VAE algorithm.
Open Source Code	Yes	Co DE-VAE is available at: https://github.com/rogelioamancisidor/codevae.
Open Datasets	Yes	Following the standard experimental setup in this domain (Sutter et al., 2021; Daunhawer et al., 2022; Palumbo et al., 2023), performance is evaluated using the following multimodal datasets: MNIST-SVHN-Text dataset (Sutter et al., 2020) composed of matching MNIST and SVHN digits and a text describing the digit; Poly MNIST (Sutter et al., 2021) composed of 5 MNIST images of the same digit, but different background and handwriting style; and The Caltech Birds (CUB) dataset (Wah et al., 2011; Daunhawer et al., 2022), which is composed of images of birds paired with captions describing each bird.
Dataset Splits	Yes	The triples are created in a many-to-many mapping, therefore, there are 1,121,360 and 200,000 observations in the train and test sets, respectively. [...] Training and test sets have 60,000 and 10,000 images, respectively. [...] there are in total 117,880 pairs of image-captions, where 88,550 are used for model training and 29,330 for testing. [...] The train and set sets have 162,560 and 19,712 observations, respectively.
Hardware Specification	Yes	Models are trained on single A100 GPUs with AMD EPYC Milan processors with 24 cores. [...] Finally, we acknowledge Sigma2 (Norway) for awarding this project access to the LUMI supercomputer, owned by the Euro HPC Joint Undertaking, hosted by CSC (Finland) and the LUMI consortium through project no. NN10040K.
Software Dependencies	No	The paper mentions the Adam optimizer (Kingma & Ba, 2017), Logistic Regression class in sklearn, Tensor Flow pre-trained inception network, and the cv2 library, but it does not specify version numbers for any of these software components or libraries.
Experiment Setup	Yes	We train our Co DE-VAE model with the Adam optimizer with default values and a learning rate of 0.001, using mixed-precision to speed up model training. Both image modalities are assumed to have Laplace likelihoods, whereas the text modality is assumed to have a categorical likelihood. The dimension of the latent space is set to 20, as in (Sutter et al., 2020; 2021; Palumbo et al., 2023; Mancisidor et al., 2024). [...] The β value is found by cross-validation using the values [0.1, 1, 5, 10, 15, 20]. We also consider β = 2.5 in the Poly MNIST data to replicate the setting in (Palumbo et al., 2023). For all datasets, we assume that the prior distribution is an isotropic Gaussian distribution, and the expert distributions are assumed to be multivariate Gaussian distributions with diagonal covariance matrix.