reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

D-Separation for Causal Self-Explanation

Authors: Wei Liu, Jun Wang, Haozhao Wang, Ruixuan Li, Zhiying Deng, YuanKai Zhang, Yang Qiu

NeurIPS 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, we demonstrate that MCD improves the F1 score by up to 13.7% compared to previous state-of-the-art MMI-based methods. Our code is available at: https://github.com/jugechengzi/Rationalization-MCD. 5 Experiments 5.1 Datasets and metrics 5.2 Baselines and implementation details 5.3 Results
Researcher Affiliation	Collaboration	Wei Liu1 Jun Wang2 Haozhao Wang1 Ruixuan Li1 Zhiying Deng1 Yuankai Zhang1 Yang Qiu1 1School of Computer Science and Technology, Huazhong University of Science and Technology 2i Wudao Tech 1EMAIL EMAIL
Pseudocode	No	The paper includes architectural diagrams (Figure 3) and mentions PyTorch implementation in the appendix, but it does not contain a dedicated pseudocode or algorithm block.
Open Source Code	Yes	Our code is available at: https://github.com/jugechengzi/Rationalization-MCD.
Open Datasets	Yes	Datasets 1) Beer Advocate (Mc Auley et al., 2012) is a multi-aspect sentiment prediction dataset widely adopted in rationalization studies. 2) Hotel Reviews (Wang et al., 2010) is another multi-aspect sentiment classification dataset containing less feature correlation
Dataset Splits	No	The paper uses datasets and mentions training, but does not explicitly provide details about training/validation/test splits (e.g., percentages or sample counts).
Hardware Specification	Yes	All models are trained on a RTX3090 GPU.
Software Dependencies	No	The paper mentions software components like GloVe, GRUs, Gumbel-softmax, and Adam, but it does not provide specific version numbers for any of these or for the core programming environment (e.g., Python, PyTorch).
Experiment Setup	Yes	We set the sparsity to be similar to previous methods by adjusting the sparsity regularization term (i.e., s) in Equation 4.