reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Causal Representation Learning from Multimodal Biomedical Observations

Authors: Yuewen Sun, Lingjing Kong, Guangyi Chen, Loka Li, Gongxu Luo, Zijian Li, Yixuan Zhang, Yujia Zheng, Mengyue Yang, Petar Stojanov, Eran Segal, Eric P Xing, Kun Zhang

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, we present a practical framework to instantiate our theoretical insights. We demonstrate the effectiveness of our approach through extensive experiments on both numerical and synthetic datasets. Results on a real-world human phenotype dataset are consistent with established biomedical research, validating our theoretical and methodological framework.
Researcher Affiliation	Academia	1Mohamed bin Zayed University of Artificial Intelligence, 2Carnegie Mellon University, 3University of Bristol, 4Broad Institute of MIT and Harvard
Pseudocode	Yes	G ALGORITHM PSEUDOCODE Algorithm 1 Pseudocode for the proposed algorithm.
Open Source Code	No	The paper does not contain any explicit statement about releasing source code or provide a link to a code repository for the methodology described.
Open Datasets	Yes	For example, the human phenotype dataset (Levine et al., 2024) contains measurements from multiple modalities, including anthropometric data, sleep monitoring, and genetic information. We manually create a variant of the MNIST dataset to encode causal relationships between different modalities, using colored MNIST (Arjovsky et al., 2019) and fashion MNIST (Xiao et al., 2017) as two different modalities. The human phenotype dataset (Shilo et al., 2021) is a large-scale, longitudinal collection of phenotypic profiles from a diverse global population.
Dataset Splits	No	The paper describes the generation of numerical and synthetic datasets and mentions sample sizes (e.g., n=10000) but does not provide specific training, validation, or test splits for any of the datasets used in the experiments.
Hardware Specification	No	The estimation framework was trained using the Adam optimizer on GPU
Software Dependencies	No	The paper mentions general software components like "Adam optimizer", "MLP with Leaky ReLU", "CNN", "Conv Transpose2D", and "LSTMs", but does not specify any software libraries or packages with version numbers required for reproducibility.
Experiment Setup	Yes	The training process ran for a maximum of 10000 epochs, with early stopping applied if the validation loss does not improve for 20 consecutive epochs. Random seeds were used to ensure reproducibility, and results were averaged across experiments, with variance reported. Hyperparameter. The hyperparameters α = [αInd, αSp, αRecon] represent the weights assigned to each term in the composite objective function. For the experiments, the following settings were applied: α = [1e-1, 1e-2, 1] for the synthetic dataset, α = [1e-2, 1e-3, 2] for the MNIST dataset, and α = [1e-1, 1e-2, 1] for the phenotype dataset.