reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Conditional Testing based on Localized Conformal $p$-values

Authors: Xiaoyang Wu, Lin Lu, Zhaojun Wang, Changliang Zou

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Numerical simulations and real-data examples validate the superior performance of our proposed strategies. We also validate our methods through simulations and real-data experiments. With commonly used prediction algorithms, our proposed methods exhibit superiority in terms of both validity and power compared to existing approaches.
Researcher Affiliation	Academia	Xiaoyang Wu , Lin Lu , Zhaojun Wang, Changliang Zou NITFID, School of Statistics and Data Science, LPMC and KLMDASR and LEBPS, Nankai University, Tianjin, China EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode	Yes	The outlier detection procedure is summarized in Algorithm 1 in Appendix D. ... We summarize our label screening procedure in Algorithm 2 in Appendix D. ... Algorithm 3 Two-sample conditional distribution test via aggregation of simplified LCP s Input: Two samples D1 and D2; density ratio estimation subroutines A1, A2, kernel density H( , ), the nominal type I error level α (0, 1) 1: Randomly split the two samples as D1 = DT1 DC1 and D2 = DT2 DC2
Open Source Code	Yes	REPRODUCIBILITY STATEMENT Code for implementing our methods and reproducing the experiments and figures in our paper is available at https://github.com/lulin2023/LCP-testing.
Open Datasets	Yes	We utilize the health indicators dataset from Kaggle (Kaggle, 2021) to demonstrate the performance of our conditional label screening method. ... We consider using the House Sales in the King County, USA dataset (Kaggle, 2016) ... we consider using the airfoil dataset (Brooks et al., 2014) from the UCI Machine Learning Repository to demonstrate the effectiveness of our proposed LCT on real data.
Dataset Splits	Yes	We fix the size of the labeled data D1 at n = 500 and the test data D2 at m = 2, 000, randomly sampling them from the original dataset without replacement in each replication. ... We randomly sample three parts of the data from the whole dataset: ntr = 2, 000 training data, ncal = 2, 000 calibration data and nte = 3, 000 test data. ... Under each scenario, we fix d = 5 and consider different sample sizes n = m {200, 400, 600, 800, 1000} with equal splitting. ... Random partition. Randomly partition the dataset into two groups with sizes \|D1\| = 751 and \|D2\| = 752.
Hardware Specification	No	The paper does not provide specific hardware details used for running its experiments.
Software Dependencies	No	The paper mentions several algorithms such as 'linear logistic regression (LL), neural network (NN) and random forest (RF)' and programming concepts but does not specify any software libraries or versions used for implementation.
Experiment Setup	Yes	For the conditional outlier detection problem, we consider a heterogeneous linear regression model with label Y in which the data are generated as follows: Scenario A1: the covariate vector consists of X = (X1, . . . , Xd 1) Rd 1 with d = 10 and an additional time feature t R. The model is Y = Xβ + (3 + 2 sin(2π t)) ε, with X1, . . . , Xd 1 U[ 1, 1], t U[0, 1] and ε N(0, 1) independently. The coefficient vector is β = (0.5, 0.5, 0.5, 0.5, 0.5, 0, 0, 0, 0). ... We fix the nominal level at α = 0.05, and apply two different algorithms to compute the score vectors: linear logistic regression (LL) and random forest (RF). ... For both scenarios, the kernel function of the LCP-od method is taken as the Gaussian kernel H(x, x ) = (2πh2) d/2 exp{ x x 2 2/(2h2)} with bandwidth h = (n/2) 1/(d+2) for d = 1, 2 in Scenarios A1 and B1.