reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Statistical Test for Feature Selection Pipelines by Selective Inference

Authors: Tomohiro Shiraishi, Tatsuya Matsukawa, Shuichi Nishino, Ichiro Takeuchi

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We theoretically prove that our statistical test can control the probability of false positive feature selection at any desired level, and demonstrate its validity and effectiveness through experiments on synthetic and real data. Additionally, we present an implementation framework that facilitates testing across any configuration of these feature selection pipelines without extra implementation costs.
Researcher Affiliation	Academia	*Equal contribution 1Nagoya University, Aichi, Japan 2RIKEN, Tokyo, Japan. Correspondence to: Ichiro Takeuchi <EMAIL>.
Pseudocode	Yes	The overall procedure for computing the interval [Lz, Uz] by applying the update rules in the order of the topological sorting of the DAG is summarized in Algorithm 1, where the operation pa receives the index of the target node and returns the indexes of its parent nodes, and pa(1) is set to 0. Algorithm 1 satisfies the specifications described in 4.2, i.e., the following theorem holds.
Open Source Code	Yes	For reproducibility, our experimental code is available at https: //github.com/shirara1016/statistical_ test_for_feature_selection_pipelines.
Open Datasets	Yes	We compared the proposed and w/o-pp in terms of power, for the cv pipeline on eight real-world datasets from the UCI Machine Learning Repository (all licensed under the CC BY 4.0; see Appendix D.5 for more details).
Dataset Splits	Yes	For the experiments to see the type I error rate, we change the number of samples n {100, 200, 300, 400} and set the number of features d to 20. ... For each configuration, we generated 10,000 null datasets (X, y)... Missing values were introduced by randomly setting each yi to Na N with a probability of 0.03. ... From each original dataset, we randomly generated 1,000 sub-sampled datasets with sample sizes of n {100, 150, 200}.
Hardware Specification	Yes	All numerical experiments were conducted on a computer with a 96-core 3.60GHz CPU and 512GB of memory.
Software Dependencies	No	import numpy as np from si4pipeline import * The paper mentions using 'numpy' and a custom package 'si4pipeline' but does not provide specific version numbers for these software dependencies.
Experiment Setup	Yes	In all experiments, we set the significance level α = 0.05. For the experiments to see the type I error rate, we change the number of samples n {100, 200, 300, 400} and set the number of features d to 20. ... To investigate the power, we set n = 200 and d = 20... We change the true coefficients {0.2, 0.4, 0.6, 0.8}. ... Missing values were introduced by randomly setting each yi to Na N with a probability of 0.03.