reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Python package for causal discovery based on LiNGAM

Authors: Takashi Ikeuchi, Mayumi Ide, Yan Zeng, Takashi Nicholas Maeda, Shohei Shimizu

JMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We compared the accuracy and runtime of our implementation of the ICA-based Li NGAM algorithm with those of an existing package, pcalg, for diﬀerent numbers of variables. We also tested our implementation of Direct Li NGAM for comparison. [...] Fig. 1 shows that our implementation of Direct Li NGAM was more accurate than our and pcalg implementations of ICA-based Li NGAM.
Researcher Affiliation	Collaboration	Takashi Ikeuchi EMAIL Mayumi Ide EMAIL SCREEN Advanced System Solutions Co., Ltd., Japan Yan Zeng EMAIL Department of Computer Science and Technology, Tsinghua University, China Takashi Nicholas Maeda EMAIL School of System Design and Technology, Tokyo Denki University, Japan Center for Advanced Intelligence Project, RIKEN, Japan Shohei Shimizu EMAIL Faculty of Data Science, Shiga University, Japan Center for Advanced Intelligence Project, RIKEN, Japan
Pseudocode	No	The paper provides a brief code snippet for model instantiation and fitting: 'model = lingam.DirectLiNGAM() model.fit(X)', but this is not a structured pseudocode or algorithm block.
Open Source Code	Yes	The source code is freely available under the MIT license at https://github.com/cdt15/lingam.
Open Datasets	No	The paper states: 'The python code used to generate artiﬁcial data in our experiments is available at https://github.com/cdt15/lingam/blob/master/examples/data/GenerateDatasets.ipynb.' This is code to generate artificial data, not an external publicly available dataset used for evaluation.
Dataset Splits	No	The paper mentions varying sample sizes (e.g., sample=200, sample=1000, sample=5000) and dimensions (e.g., dim=10, dim=50, dim=100) for artificial data generation but does not provide specific training/test/validation splits or cross-validation details for the experiments.
Hardware Specification	No	The paper does not mention any specific hardware details (GPU/CPU models, memory, etc.) used for running its experiments.
Software Dependencies	No	The paper mentions the use of 'scikit-learn' and refers to a 'Python package' for causal discovery, but it does not specify any version numbers for these or any other software dependencies.
Experiment Setup	No	The paper describes the methods and compares their accuracy (SHD) and runtime across different dimensions and sample sizes. However, it does not provide specific experimental setup details such as hyperparameters, optimizer settings, or other configuration parameters for the algorithms or their estimation processes.