Causal Representation Learning from Multimodal Biomedical Observations
Authors: Yuewen Sun, Lingjing Kong, Guangyi Chen, Loka Li, Gongxu Luo, Zijian Li, Yixuan Zhang, Yujia Zheng, Mengyue Yang, Petar Stojanov, Eran Segal, Eric P Xing, Kun Zhang
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we present a practical framework to instantiate our theoretical insights. We demonstrate the effectiveness of our approach through extensive experiments on both numerical and synthetic datasets. Results on a real-world human phenotype dataset are consistent with established biomedical research, validating our theoretical and methodological framework. |
| Researcher Affiliation | Academia | 1Mohamed bin Zayed University of Artificial Intelligence, 2Carnegie Mellon University, 3University of Bristol, 4Broad Institute of MIT and Harvard |
| Pseudocode | Yes | G ALGORITHM PSEUDOCODE Algorithm 1 Pseudocode for the proposed algorithm. |
| Open Source Code | No | The paper does not contain any explicit statement about releasing source code or provide a link to a code repository for the methodology described. |
| Open Datasets | Yes | For example, the human phenotype dataset (Levine et al., 2024) contains measurements from multiple modalities, including anthropometric data, sleep monitoring, and genetic information. We manually create a variant of the MNIST dataset to encode causal relationships between different modalities, using colored MNIST (Arjovsky et al., 2019) and fashion MNIST (Xiao et al., 2017) as two different modalities. The human phenotype dataset (Shilo et al., 2021) is a large-scale, longitudinal collection of phenotypic profiles from a diverse global population. |
| Dataset Splits | No | The paper describes the generation of numerical and synthetic datasets and mentions sample sizes (e.g., n=10000) but does not provide specific training, validation, or test splits for any of the datasets used in the experiments. |
| Hardware Specification | No | The estimation framework was trained using the Adam optimizer on GPU |
| Software Dependencies | No | The paper mentions general software components like "Adam optimizer", "MLP with Leaky ReLU", "CNN", "Conv Transpose2D", and "LSTMs", but does not specify any software libraries or packages with version numbers required for reproducibility. |
| Experiment Setup | Yes | The training process ran for a maximum of 10000 epochs, with early stopping applied if the validation loss does not improve for 20 consecutive epochs. Random seeds were used to ensure reproducibility, and results were averaged across experiments, with variance reported. Hyperparameter. The hyperparameters α = [αInd, αSp, αRecon] represent the weights assigned to each term in the composite objective function. For the experiments, the following settings were applied: α = [1e-1, 1e-2, 1] for the synthetic dataset, α = [1e-2, 1e-3, 2] for the MNIST dataset, and α = [1e-1, 1e-2, 1] for the phenotype dataset. |