reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Learning Local Neighborhoods of Non-Gaussian Graphical Models

Authors: Sarah Liaw, Rebecca Morrison, Youssef Marzouk, Ricardo Baptista

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the effectiveness of our algorithm in both Gaussian and non-Gaussian settings by comparing it to existing methods. Lastly, we show the scalability of the proposed approach by applying it to high-dimensional non-Gaussian examples, including a biological dataset with more than 150 variables. Numerical Results We now aim to answer the following questions: (1) Can L-SING accurately quantify the conditional dependencies of X without relying on assumptions about the distribution of X? (2) Is L-SING computationally tractable for highdimensional problems? The first and second experiments address question (1), while the second and third experiments address question (2). Additionally, we compare the performance of ˆΩL-SING to existing methods on the same test dataset (see the ar Xiv version for detailed experimental setups).
Researcher Affiliation	Academia	1California Institute of Technology 2University of Colorado Boulder 3Massachusetts Institute of Technology EMAIL, EMAIL, EMAIL
Pseudocode	Yes	Algorithm 1: L-SING Algorithm
Open Source Code	Yes	The code to reproduce the numerical experiments is available at: https://github.com/Sarah Liaw/L-SING.
Open Datasets	Yes	Finally, we address question (2) by demonstrating the scalability of L-SING on the high-dimensional curated Ovarian Data (Ganzfried et al. 2013), comprising gene expression profiles from 578 ovarian cancer patients sourced from The Cancer Genome Atlas (TCGA).
Dataset Splits	Yes	the final dataset included 156 genes (variables) and 578 samples, split into 346 training, 117 evaluation, and 115 validation samples.
Hardware Specification	No	No specific hardware details (e.g., GPU/CPU models, memory) are provided for running the experiments. The paper only mentions computation times in seconds.
Software Dependencies	No	The paper mentions UMNN (Unconstrained Monotonic Neural Networks) as a method and refers to algorithms like GLASSO and Lasso. However, it does not specify version numbers for any software libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages used.
Experiment Setup	Yes	Figure 5a shows the estimated generalized precision matrix ˆΩfor r = 5 pairs (d = 10), as computed using UMNN map components with [64, 64, 64] hidden layers and M = 5, 000 training samples. To show the scalability of L-SING, Figure 5b presents ˆΩfor r = 20 pairs (d = 40), using M = 5, 000 training samples and the same UMNN architecture. Figure 8a presents ˆΩ, as computed using UMNN map components with [64, 128, 128] hidden layers.