reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Extracting Rare Dependence Patterns via Adaptive Sample Reweighting

Authors: Yiqing Li, Yewei Xia, Xiaofei Wang, Zhengming Chen, Liuhua Peng, Mingming Gong, Kun Zhang

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical evaluation of synthetic and real-world datasets comprehensively demonstrates the efficacy of our method. Empirically, we conduct extensive experiments on synthetic and real-world data that demonstrate the efficacy of our method. We apply the proposed testing method to both synthetic and real data to evaluate their performance.
Researcher Affiliation	Academia	1Department of Machine Learning, Mohamed Bin Zayed University of Artificial Intelligence, Abu Dhabi, UAE 2Shanghai Key Lab of Intelligent Information Processing, and School of Computer Science, Fudan University, Shanghai, China 3KLAS and School of Mathematics and Statistics, Northeast Normal University, Changchun, China 4College of Mathematics and Computer, Shantou University, Shantou, China 5School of Mathematics and Statistics, The University of Melbourne, Melbourne, VIC, Australia 6Department of Philosophy, Carnegie Mellon University, Pittsburgh, USA.
Pseudocode	Yes	Algorithm 1 Reweighted HSIC (RHSIC) ... Algorithm 2 Rare Dependence PC (RD-PC)
Open Source Code	Yes	Codes are available at https://github.com/leeedwina430/RKCIT.
Open Datasets	Yes	Sachs Dataset. We apply our RHSIC to a flow cytometry dataset (Sachs et al., 2005)... Financial Dataset. We also apply our method to monthly JPY/USD exchange rates (E) and U.S. federal funds rates (F) from 1990 to 2010, sourced from Federal Reserve Economic Data (FRED).
Dataset Splits	Yes	We randomly split it into disjoint training (Dtr) and testing (Dte) data. The split ratio is set to 0.5.
Hardware Specification	No	The paper does not explicitly describe the hardware used to run its experiments, such as specific GPU or CPU models.
Software Dependencies	No	The paper mentions using Python libraries like 'scipy' and 'causal-learn' and refers to other codebases, but it does not specify any version numbers for these software components.
Experiment Setup	Yes	The significance level is set to 0.05. The results are obtained after averaging the values in the 100 tests. We set the number of permutations to 2000 to approximate the null distribution. The hyperparameters in our objective functions (9) are set to λ1 = λ2 =1e-3 for RHSIC and λ1 =1e-6, λ2 =1e-1 for RKCIT. And the ϵ for kernel ridge regression is set to 1e-3.