reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Bivariate Causal Discovery with Proxy Variables: Integral Solving and Beyond

Authors: Yong Wu, Yanwei Fu, Shouyan Wang, Xinwei Sun

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate these findings and the effectiveness of our proposals through comprehensive numerical studies. [...] 7. Experiments In this section, we evaluate our methods on synthetic data.
Researcher Affiliation	Academia	1Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University 2Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence (Fudan University), Ministry of Education 3State Key Laboratory of Medical Neurobiology and MOE Frontiers Center for Brain Science, Fudan University 4Zhangjiang Fudan International Innovation Center 5School of Data Science, Fudan University 6Shanghai Engineering Research Center of AI & Robotics, Fudan University 7Engineering Research Center of AI & Robotics, Ministry of Education, Fudan University. Correspondence to: Xinwei Sun <EMAIL>.
Pseudocode	No	No explicit pseudocode or algorithm blocks were found. The paper describes methods in narrative text and mathematical formulations.
Open Source Code	Yes	2Code is available at https://github.com/yezichu/proximal_c ausal_discovery_cv.
Open Datasets	No	Data generation. We follow (Liu et al., 2023) to generate V {X, Y, U, W} via V = f V (PAV ) + εV , where PAV and εV respectively denotes the parent set and the noise of V . For each V , f V is randomly selected from {linear, tanh, sin, sqrt}. Besides, the distribution of εV is randomly chosen from {Gaussian, uniform, exponential, gamma}.
Dataset Splits	No	At each time, we generate 100 replications under each H0 and H1, and record the type-I error rate and power rate.
Hardware Specification	No	The computations in this research were performed using the CFFF platform of Fudan University.
Software Dependencies	No	For the procedure described in Miao, we implement the R code released in the paper and set l X = 3, l W = 2, l Z = 2 by default. For KCI, we adopt the implementations provided in the causallearn packages https://causal-learn.readthedocs.io/.
Experiment Setup	Yes	We set the significance level α to 0.05. We choose φ and m to be complex exponential functions. For PMCR estimation, we set K = 100 and follow (Mastouri et al., 2021) to select the optimal λ from a sequence ranging from 4.9 10 6 to 0.25, with a step size chosen to ensure the sequence contains 50 values. Besides, we use Gaussian kernels with the bandwidth parameters being initialized using the median distance heuristic. For the procedure of Liu (Liu et al., 2023), we follow the paper to set the bin numbers of W and X to l X = 14, l W = 12, respectively. For the procedure described in Miao, we implement the R code released in the paper and set l X = 3, l W = 2, l Z = 2 by default.