reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Precise High-Dimensional Asymptotics for Quantifying Heterogeneous Transfers

Authors: Fan Yang, Hongyang R. Zhang, Sen Wu, Christopher Re, Weijie J. Su

JMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Simulations validate the accuracy of the high-dimensional asymptotics for ﬁnite dimensions. Simulations are provided to show the accuracy of our estimates in ﬁnite dimensions.
Researcher Affiliation	Academia	Fan Yang EMAIL Yau Mathematical Sciences Center, Tsinghua University Beijing 100084, China Hongyang R. Zhang EMAIL Khoury College of Computer Sciences, Northeastern University Boston, MA 02115, US Sen Wu EMAIL Department of Computer Science, Stanford University Stanford, CA 94305, US Christopher Ré EMAIL Department of Computer Science, Stanford University Stanford, CA 94305, US Weijie J. Su EMAIL Department of Statistics and Data Science, University of Pennsylvania Philadelphia, PA 19104, US
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks. It describes methodologies and proofs using mathematical notation and natural language.
Open Source Code	Yes	The experiment code for reproducing these simulation results can be found at https://github.com/Virtuoso-Research/Transfer_learning_random_matrix_simulations.
Open Datasets	No	The paper describes generating synthetic data for its simulations (e.g., 'We sample the covariates X from a p-dimensional isotropic Gaussian', 'generate covariate-shifted features and diﬀerent linear models'). It does not use or provide access information for any established public datasets.
Dataset Splits	No	The paper uses simulated data and defines sample sizes for source and target tasks (e.g., 'n1 samples', 'n2 samples'). However, it does not describe traditional train/test/validation splits for empirical data evaluation.
Hardware Specification	No	The paper does not provide any specific details about the hardware used to run the experiments.
Software Dependencies	No	The paper does not specify any software libraries, tools, or programming languages with their version numbers that are needed to replicate the experiments.
Experiment Setup	Yes	In Figure 1, the text states: 'In this simulation, we set p = 100, n2 = 300, and σ2 = 1/4.' Figure 2b mentions: 'This simulation ﬁxes p = 100, n2 = 300 and varies n1, λ. Both simulations use σ = 1/2.' Figure 3a states: 'For this simulation, we set p = 50, n1 = n2 = 100, and σ = 1/2.' Figure 4a mentions: 'We ﬁx model shift µ = 0.1 while varying covariate shift λ for each curve.' Figure 5b states: 'Figure 5b ﬁxes r = 1 and varies µ, n. The results under diﬀerent values of µ also match the conditions in Proposition 14.'