reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

A Skewness-Based Criterion for Addressing Heteroscedastic Noise in Causal Discovery

Authors: Yingyu Lin, Yuxing Huang, Wenqin Liu, Haoran Deng, Ignavier Ng, Kun Zhang, Mingming Gong, Yian Ma, Biwei Huang

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical studies further validate the effectiveness of the proposed method. --- We conduct experiments using synthetic data generated from HSNMs in three settings: (1) bivariate HSNMs; (2) latent-confounded triangular HSNM setting (bivariate model with a latent confounder, as discussed in Section 5); and (3) multivariate HSNMs. --- Metrics. For the pairwise causal discovery settings, we report the accuracy of the estimated causal directions.
Researcher Affiliation	Academia	1UC San Diego, 2New York University, 3The University of Melbourne, 4Zhejiang University, 5Carnegie Mellon University, 6Mohamed bin Zayed University of Artificial Intelligence EMAIL,EMAIL, EMAIL,EMAIL, EMAIL
Pseudocode	Yes	Algorithm 1 Skew Score 1: Input: Data matrix X Rn d, test odd function ψ(s) = s3, conditional independence test oracle CI (returns p-value), and significance level α (0, 1).
Open Source Code	No	The paper does not contain a clear and unambiguous statement about the release of source code, nor does it provide a direct link to a code repository. The text mentions that 'our method can easily be deployed on GPUs', but this does not constitute a code release.
Open Datasets	Yes	Our method s accuracy is 62.6% in the T ubingen cause-effect pairs dataset, excluding multivariates and binary pairs (47, 52-55, 70, 71, 105 and 107) as Immer et al. (2023).
Dataset Splits	No	The paper mentions 'The sample size for all experiments is 5000' and 'we subsample 1000 points for conditional independence testing'. It also discusses averaging results over '100 independent runs' or '10 runs', and varying sample sizes in Appendix J (n = {100, 1000, 10000}). However, it does not specify explicit training, validation, and test splits (e.g., as percentages or absolute counts) that would be needed to reproduce the data partitioning for model training and evaluation.
Hardware Specification	Yes	All experiments are on 12 CPU cores with 24 GB RAM.
Software Dependencies	No	The paper mentions the use of 'sliced score matching (SSM) with a 3-layer MLP' and 'Adam optimizer', but it does not specify any version numbers for these or other software libraries/dependencies. For example, it does not state 'Python 3.x', 'PyTorch 1.x', etc.
Experiment Setup	Yes	The score estimator employs sliced score matching (SSM) with a 3-layer MLP. We use a batch size of 128 and configure the hidden dimensions to 128 for d < 10 and 512 for d >= 10. Optimization is performed using the Adam optimizer with a learning rate of 10^-3, and we subsample 1000 points for conditional independence testing after obtaining the topological order.