reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Score-Based Multimodal Autoencoder

Authors: Daniel Wesego, Pedram Rooshenas

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We study our proposed methods and selected baselines using an extended version of Poly MNIST (Sutter et al., 2021) as well as high-dimensional Celeb AMask-HQ (Lee et al., 2020) datasets. We compare our methods SBM-VAE and SBM-RAE... We evaluate all methods on both prediction coherence and generative quality. To measure the coherence, we use a pre-trained classifier to extract the label of the generated output and compare it with the associated label of the observed modalities Shi et al. (2019). The coherence of the unconditional generation is evaluated by counting the number of consistent predicted labels from the pre-defined classifier. We also measure the generative quality of the generated modalities using the FID score (Heusel et al., 2017). All the results that are shown are run for at least 3 times and the mean is shown. The standard deviation is shown as shades under the curves in each figure. Figure 3 shows the generated samples from the third modality given the rest.
Researcher Affiliation	Academia	Daniel Wesego EMAIL Department of Computer Science University of Illinois Chicago Pedram Rooshenas EMAIL Department of Computer Science University of Illinois Chicago
Pseudocode	Yes	The following algorithms 1, 2 show the training and inference algorithm we use. Algorithm 1 Training... Algorithm 2 Inference...
Open Source Code	Yes	The code can be found at https://github.com/rooshenasgroup/sbmae
Open Datasets	Yes	We study our proposed methods and selected baselines using an extended version of Poly MNIST (Sutter et al., 2021) as well as high-dimensional Celeb AMask-HQ (Lee et al., 2020) datasets. ... We use the audio and image modalities from the MHD dataset by Vasco et al. (2022). ... We use partial samples, approximately 100K, from the soundnet dataset (Aytar et al., 2016)
Dataset Splits	Yes	The extended Poly Mnist dataset was updated from the original Poly Mnist dataset by Sutter et al. (2020) with different background images and ten modalities. It has 50,000 training set, 10,000 validation set, and 10,000 test set.
Hardware Specification	Yes	We use A100 GPU for computing the time the models take.
Software Dependencies	No	No specific software dependencies with version numbers are explicitly mentioned in the main text or appendix. The paper mentions using specific algorithms and optimizers (e.g., Adam optimizer (Kingma & Ba, 2015), Predictor-Corrector (PC) sampling algorithm (Song et al., 2020b)) and models (e.g. Hifi-GAN model) but without explicit software version numbers for their implementations or other libraries.
Experiment Setup	Yes	Hyperparameters and neural network design are discussed in detail in Appendix A.2. ... The VAEs for each modality are trained with an initial learning rate of 0.001 using a ̘ value of 0.1 where all the prior, posterior, and likelihood are Gaussians. ... We use a learning rate of 0.0002 with the Adam optimizer (Kingma & Ba, 2015). The detailed hyperparameters are shown in table 4. We use the VPSDE with ̘0 = 0.1 and ̘1=5 with N = 100 and the PC sampling technique with Euler-Maruyama and Langevin Dynamics. For modalities less than 10, we use ̘0 of 1, the others hyperparameters remain the same.