reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Causal-learn: Causal Discovery in Python

Authors: Yujia Zheng, Biwei Huang, Wei Chen, Joseph Ramsey, Mingming Gong, Ruichu Cai, Shohei Shimizu, Peter Spirtes, Kun Zhang

JMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The causal-learn package also contains extensive usage examples of all search methods, (conditional) independence tests, score functions, and utilities (https://github.com/py-why/ causal-learn/tree/main/tests). For instance, causal discovery using PC is as simple as: cg = pc(data) # apply PC with default parameters Detailed documentation including all APIs and data structures is available at https:// causal-learn.readthedocs.io/en/latest. It also includes a collection of well-tested benchmark datasets since ground-truth causal relations are often unknown for real data, evaluation of causal discovery methods has been notoriously known to be hard, and we hope the availability of such benchmark datasets can help alleviate this issue and inspire the collection of more real-world datasets with (at least partially) known causal relations. Functions to import these datasets have also been included in the library.
Researcher Affiliation	Academia	1 Carnegie Mellon University 2 University of California, San Diego 3 Guangdong University of Technology 4 University of Melbourne 5 Shiga University 6 Mohamed bin Zayed University of Artiﬁcial Intelligence 7 RIKEN
Pseudocode	No	The paper describes the causal-learn library and outlines various causal discovery methods it incorporates, such as constraint-based, score-based, and function-based approaches, along with their associated algorithms. However, it does not present any structured pseudocode or algorithm blocks within the text.
Open Source Code	Yes	We describe causal-learn, an open-source Python library for causal discovery. This library focuses on bringing a comprehensive collection of causal discovery methods to both practitioners and researchers. [...] The library is available at https://github.com/py-why/causal-learn.
Open Datasets	Yes	It also includes a collection of well-tested benchmark datasets since ground-truth causal relations are often unknown for real data, evaluation of causal discovery methods has been notoriously known to be hard, and we hope the availability of such benchmark datasets can help alleviate this issue and inspire the collection of more real-world datasets with (at least partially) known causal relations. Functions to import these datasets have also been included in the library.
Dataset Splits	No	The paper mentions a collection of well-tested benchmark datasets and includes functions to import them within the causal-learn library. However, it does not specify any particular training/test/validation splits, percentages, or methodologies for partitioning these datasets.
Hardware Specification	No	The paper describes the causal-learn library and its features but does not provide any specific details regarding the hardware (e.g., GPU/CPU models, memory specifications, cloud instances) used for running experiments or evaluations.
Software Dependencies	No	The paper states that "causal-learn is fully developed in Python" and mentions other packages that can be used with it (e.g., DoWhy and Ananke). However, it does not provide specific version numbers for Python or any other key software libraries that causal-learn depends on for its functionality.
Experiment Setup	No	The paper describes the causal-learn library and the various algorithms it implements, mentioning general usage examples with benchmark datasets. However, it does not provide specific experimental setup details such as hyperparameter values, learning rates, batch sizes, or other training configurations for any particular experiment.