reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Latent mixed-effect models for high-dimensional longitudinal data

Authors: Priscilla Ong, Manuel Haussmann, Otto Lönnroth, Harri Lähdesmäki

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate LMM-VAE on three tasks: (1) missing value imputation, (2) future prediction, and (3) time-based interpolation, measuring its performance using the Mean Squared Error (MSE). Our primary comparison is with the GP prior VAEs, which are most similar to our approach in modelling the prior through regression techniques. For consistency, we employed similar encoder and decoder architectures across all methods (see Appendix M for further details). We rely on Health MNIST and Rotating MNIST, which are two datasets derived from MNIST (Le Cun et al., 2010), and the Physionet Challenge 2012 dataset (Silva et al., 2012). See Appendices H, I and K for hyperparameters and further details on the experimental set-up not discussed in the main paper. Results in bold are within one standard deviation of the best mean per experimental set-up. As a consistency check, we also compared the performance of LMM and LMM-VAE on a simplified, low-dimensional set-up, which is presented in Appendix E.
Researcher Affiliation	Academia	Priscilla Ong EMAIL Department of Computer Science Aalto University Manuel Haußmann EMAIL Department of Mathematics and Computer Science University of Southern Denmark Otto Lönnroth EMAIL Department of Computer Science Aalto University Harri Lähdesmäki EMAIL Department of Computer Science Aalto University
Pseudocode	No	The paper describes methodologies in text and mathematical equations, but there are no explicit figures, blocks, or sections labeled "Pseudocode" or "Algorithm" with structured steps.
Open Source Code	No	The paper mentions external code implementations for baseline models and metrics (SVGP-VAE, GPP-VAE, LVAE, iVAE MCC implementation) by providing their GitHub links. However, there is no explicit statement or link indicating that the source code for the proposed LMM-VAE methodology itself is publicly available from the authors.
Open Datasets	Yes	We rely on Health MNIST and Rotating MNIST, which are two datasets derived from MNIST (Le Cun et al., 2010), and the Physionet Challenge 2012 dataset (Silva et al., 2012). We use a medical time series dataset derived from a randomized control trial for the treatment of colorectal cancer (dataset identifier: Colorec Sanfi U 2007 131). This dataset is taken from the open data sharing platform Project Data Sphere (Green et al., 2015).
Dataset Splits	Yes	For Health MNIST: We withhold the last 15 timepoints of 100 subjects to construct the test set. The first five timepoints of these aforementioned subjects are included in the training set. The remaining dataset is then randomly split to construct the train and validation sets, in an approximate ratio of 85 : 15. For Rotating MNIST: To construct the test set, we consider 80 instances at random, and take subsequences corresponding to four consecutive angles of the aforementioned digits. The train and validation sets are then randomly constructed based on the remaining images, in an approximate ratio of 80 : 20. For Physionet Challenge 2012: We withhold the last 10 timepoints of 1200 patients to construct the test set. The train and validation sets are then randomly constructed based on the remaining observations, in an approximate ratio of 80 : 20. For Project Data Sphere: For the test set, we consider the sequence of visits following the 5th visit (inclusive) of 100 patients. The first 4 visits of these aforementioned patients are included in the train set. The remainder of the dataset is then split amongst the train and validation sets.
Hardware Specification	No	The paper states, 'We thank the Aalto Science-IT project for the generous computational resources.' This is a general acknowledgment of computational resources but does not provide specific hardware details such as GPU models, CPU models, or memory specifications used for running the experiments.
Software Dependencies	No	The paper mentions the use of 'lme4 library in R' for LMM implementation and 'Adam optimizer Kingma & Ba (2015)' or 'Adam W Optimizer (Loshchilov & Hutter, 2019)' for training. However, it does not provide specific version numbers for R, the lme4 library, or any deep learning frameworks (e.g., PyTorch, TensorFlow) and their respective versions, which are necessary for reproducible software dependencies.
Experiment Setup	Yes	For experiments regarding LMM-VAE, we use Adam optimizer Kingma & Ba (2015) learning rate of 0.001. We monitor the loss on the validation set and employ a strategy similar in spirit to early stopping, where we save the weights of the model with the optimal validation loss. LMM-VAE was allowed to run for a maximum of 2500 epochs. We define σz = 1. For the linearly generated dataset, the decoder was parameterized as a linear function and trained with a learning rate of 0.001. For the non-linearly generated data, the decoder consisted of two hidden layers with Tanh nonlinearities and hidden sizes of 16 and 8, trained with a learning rate of 0.01. Across both set-ups, we use a latent dimension of 1, set σz as 0.005, and use the Adam W Optimizer (Loshchilov & Hutter, 2019). The paper also includes detailed Neural Network Architectures in Appendix M, specifying hyperparameters like dimensionality of input, number of convolution/feedforward layers, kernel sizes, strides, hidden unit widths, and activation functions for different experiments.