reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Wasserstein F-tests for Frechet regression on Bures-Wasserstein manifolds

Authors: Haoshu Xu, Hongzhe Li

JMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Simulations validate the accuracy of the asymptotic theory. Finally, we apply our methods to a single-cell gene expression dataset, revealing age-related changes in gene co-expression networks. In this section, we propose a Riemannian gradient descent algorithm for optimizing (15) in Section 5.1. We then present a series of numerical experiments in Section 5.2 to validate our theoretical results on the central limit theorem (Theorem 11), asymptotic null distribution (Theorem 16) and power (Theorem 18).
Researcher Affiliation	Academia	Haoshu Xu EMAIL Graduate Group in Applied Mathematics and Computational Science University of Pennsylvania Philadelphia, PA 19104, USA Hongzhe Li EMAIL Department of Biostatistics, Epidemiology and Informatics University of Pennsylvania Philadelphia, PA 19104, USA
Pseudocode	Yes	Algorithm 1 GD for Fréchet regression 1: Input: covariates {Xi}n i=1, responses {Qi}n i=1, ρ 0, n 1 , covariate x, learning rate η, initialization S0, maximum number of iterations T, threshold eps.
Open Source Code	No	The text is ambiguous or lacks a clear, affirmative statement of release. The paper mentions using the Python dcor package (Ramos-Carreño and Torrecilla, 2023) but does not provide specific access to their own source code for the methodology described in this paper.
Open Datasets	Yes	We are interested in understanding the co-expression structure of 61 genes in this KEGG nutrient-sensing pathways based on the recently published population scale single cell RNA-seq data of human peripheral blood mononuclear cells (PBMCs) from blood samples of over 982 healthy individuals with ages ranging from 20 to 90 (Yazar et al., 2022).
Dataset Splits	No	The paper refers to a 'single-cell gene expression dataset' but does not specify any training, validation, or test splits, nor does it refer to predefined splits from a cited source.
Hardware Specification	No	The paper does not provide any specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running the experiments.
Software Dependencies	Yes	We compare the power of our test with that of the distance covariance test (dcov) (Székely et al., 2007) for testing independence, using the Python dcor package (Ramos-Carreño and Torrecilla, 2023).
Experiment Setup	Yes	Algorithm 1 GD for Fréchet regression 1: Input: covariates {Xi}n i=1, responses {Qi}n i=1, ρ 0, n 1 , covariate x, learning rate η, initialization S0, maximum number of iterations T, threshold eps. ... For the initialization S0, optimization over the Euclidean space typically starts near the origin (Chen et al., 2019; Ye and Du, 2021). However, since the space of symmetric positive deﬁnite (SPD) matrices, S++ d , is nonlinear, the natural counterpart of the origin in this space is the identity matrix Id. Therefore, we initialize at S0 = Id. ... For the step size η, Altschuler et al. (2021) observed through numerical simulations that while the convergence rate of Euclidean gradient descent is highly sensitive to its step size, Riemannian gradient descent requires no tuning and works eﬀectively with η = 1 when computing the Bures-Wasserstein barycenter. In our simulations, we also ﬁnd that η = 1 performs at least as well as (and often better than) smaller step sizes. ... setting η = 1, T = 30 and eps = 10 6 in Algorithm 1.