reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Assumption-lean and data-adaptive post-prediction inference

Authors: Jiacheng Miao, Xinran Miao, Yixuan Wu, Jiwei Zhao, Qiongshi Lu

JMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the statistical superiority and broad applicability of our method through simulations and realdata applications.
Researcher Affiliation	Academia	Jiacheng Miao EMAIL Department of Biostatistics and Medical Informatics University of Wisconsin-Madison Madison, WI 53726, USA Xinran Miao EMAIL Department of Statistics University of Wisconsin-Madison Madison, WI 53706, USA Yixuan Wu EMAIL University of Wisconsin-Madison Madison, WI 53726, USA Jiwei Zhao EMAIL Department of Biostatistics and Medical Informatics, Department of Statistics University of Wisconsin-Madison Madison, WI 53726, USA Qiongshi Lu EMAIL Department of Biostatistics and Medical Informatics University of Wisconsin-Madison Madison, WI 53726, USA
Pseudocode	Yes	Algorithm 1 PSPA estimation with ML-predicted labels Algorithm 2 PSPA estimation with ML-predicted covariates
Open Source Code	Yes	The R codes to implement PSPA, benchmark methods, and replicate the simulation and real data analysis, is available at https://github.com/qlu-lab/pspa.
Open Datasets	Yes	For example, the Genotype-Tissue Expression (GTEx) project is a comprehensive study focusing on gene expression regulation in many human tissues (GTEx Consortium et al., 2015). We regressed DXA-BMD on these variables using data from the UK Biobank (UKB).
Dataset Splits	Yes	The labeled data is with 500 samples, and the unlabeled data is with 500, 1500, 2500, 5000, or 10000 samples in diﬀerent settings. In the UKB, DXA-BMD measurements are available for only 10% of the participants. Therefore, we employed the Softimpute algorithm to impute DXA-BMD values for the remaining 90% individuals in the unlabeled dataset.
Hardware Specification	No	The paper does not provide specific hardware details such as CPU/GPU models, memory, or other hardware specifications used for running experiments.
Software Dependencies	No	The paper mentions using "R codes", a "pre-trained random forest", and the "Softimpute algorithm", but does not provide specific version numbers for any software libraries or programming languages.
Experiment Setup	Yes	In all simulations, the ground truth coeﬃcients are obtained by a Monte Carlo approximation with 5 104 samples. The labeled data is with 500 samples, and the unlabeled data is with 500, 1500, 2500, 5000, or 10000 samples in diﬀerent settings. A pre-trained random forest with 100 trees to grow is obtained from a hold-out dataset with 1000 samples. All simulations are repeated 1000 times.