reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Conformal Prediction with Cellwise Outliers: A Detect-then-Impute Approach

Authors: Qian Peng, Yajie Bao, Haojie Ren, Zhaojun Wang, Changliang Zou

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	6. Simulation We write N(µ, σ2) for the normal distribution with mean µ and variance σ2, SN(µ, σ2, α) for the skewed normal with skewness parameter α, t(k) for the t-distribution with k degrees of freedom, and Bern(p) for the Bernoulli distribution with success probability p. Given any x Rd, define f(x) = E(Yi\|Xi = x) and ηi = Yi f(Xi). We consider three data generation settings in Lei et al. (2018): ... 7. Application on real data 7.1. Airfoil data We apply the proposed method to the airfoil dataset from the UCI Machine Learning Repository (Dua & Graff, 2019), where the response Y and covariates X (with 5 dimensions) are described in Appendix E.4.
Researcher Affiliation	Academia	1School of Statistics and Data Sciences, LPMC, KLMDASR and LEBPS, Nankai University, Tianjin, China 2School of Mathematical Sciences, Shanghai Jiao Tong University, Shanghai, China. Correspondence to: Yajie Bao <EMAIL>, Haojie Ren <EMAIL>.
Pseudocode	Yes	Algorithm 1 PDI-CP Input: Calibration set {(Xi, Yi)}n i=n0, test feature Xn+1, prediction model ˆµ, detection procedure D, imputation procedure I, miscoverage level α. 1: On+1 D( Xn+1) 2: ˇXDI n+1 I( Xn+1, On+1) 3: for i = n0, . . . , n do 4: ˆOi D(Xi) 5: ˇXi I(Xi, ˆOi On+1) 6: ˇRi \|Yi ˆµ( ˇXi)\| 7: end for 8: ˆCPDI( Xn+1) ˆµ( ˇXDI n+1) ˆq+ α ({ ˇRi}n i=n0) Output: ˆCPDI( Xn+1)
Open Source Code	No	No explicit statement about code release or a link to a code repository is provided in the paper. The 'Impact Statement' section only discusses the general applicability of the tools introduced.
Open Datasets	Yes	7.1. Airfoil data We apply the proposed method to the airfoil dataset from the UCI Machine Learning Repository (Dua & Graff, 2019), where the response Y and covariates X (with 5 dimensions) are described in Appendix E.4. 7.2. Wind direction data Another example involves the hourly wind direction data from a meteorological station in the Central-West region of Brazil (https://tempo.inmet.gov.br/Tabela Estacoes/A001). 7.3. Riboflavin data To further demonstrate robustness, we test our method on the gene expression dataset for riboflavin production provided by DSM (Kaiseraugst, Switzerland), which was offered by B uhlmann & Mandozzi (2014) and confirmed to have cellwise outliers by Liu et al. (2022).
Dataset Splits	Yes	All simulation results in the following are averages over 200 trials with 200 labeled data and 100 test data. 7.1. Airfoil data We select 1000 labeled data and 500 test data in 100 trials. Since it is unknown which cells are outliers in reality, we artificially introduce outliers with ϵ = 0.02 to construct test features with both genuine and artificial cellwise outliers. The details of the experiment are presented in Appendix E.4. Creating training data, test data, and covariate shift: We repeated an experiment for 200 trials, and for each trial we randomly partition the data {(Xi, Yi)}1000 i=1 into two equally sized subsets Dt and Dc, and construct a test set Dtest containing cellwise outliers with the following steps.
Hardware Specification	No	The paper does not explicitly describe any specific hardware used for running its experiments, such as GPU models, CPU models, or cloud computing specifications. It only mentions funding sources in the 'Acknowledgements' section, which are not hardware specifications.
Software Dependencies	No	The paper mentions software components like 'random forests approach' (implicitly a machine learning library), 'k-Nearest Neighbour', 'Multivariate Imputation by Chained Equations', 'one-class SVM classifier', and 'DDC method'. However, it does not specify any version numbers for these software packages or libraries, which are necessary for reproducibility.
Experiment Setup	Yes	The nominal coverage level is set to be 1 α = 90% and d = 15. ...our method is still able to achieve target 1 α coverage. The empirical TPR (true positive rate) and FDR (false discovery rate) of detection methods are given in Appendix E.2. 6.1. Combinations with other detection methods This experiment is to verify the validity of our methods under other plausible cellwise detection methods besides DDC. Here we consider two procedures: the one-class SVM classifier method (Bates et al., 2023a) with τj = 0.2 and the cell MCD estimate method (Raymaekers & Rousseeuw, 2024a) with τj = qχ2 1,0.99, where τj is determined to control the FDR (false discovery rate). 6.3. Performance under different contaminated ratios Here we explore the effect of contamination levels ϵ on our method. We set D as DDC, and the parameter p in detection threshold τj = qχ2 1,p (adjusting p) corresponding to ϵ {0.1, 0.15, 0.2} to control FDR. Table 1. Empirical TPR and FDR of DDC with different thresholds under Setting A when ϵ = 0.1. p 0.99 0.9 0.7 0.5 TPR 0.987 0.992 0.995 1 FDR 0.035 0.340 0.669 0.793 PDI coverage 0.902 0.901 0.907 0.909 JDI coverage 0.895 0.899 0.904 0.904