On Noise Abduction for Answering Counterfactual Queries: A Practical Outlook
Authors: Saptarshi Saha, Utpal Garain
TMLR 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We report experimental results on both synthetic and real-world German Credit Dataset, showcasing the promise and usefulness of the proposed exogenous noise identification. |
| Researcher Affiliation | Academia | Saptarshi Saha EMAIL Computer Vision and Pattern Recognition Unit Indian Statistical Institute, Kolkata Utpal Garain EMAIL Computer Vision and Pattern Recognition Unit Indian Statistical Institute, Kolkata |
| Pseudocode | No | The paper describes a "four-step procedure" for computing a counterfactual query in the SCM framework, which is presented as a numbered list within a paragraph in Section 5. It is not formatted as a distinct pseudocode or algorithm block. |
| Open Source Code | Yes | The code for reproducing the results is available at https://github.com/Saptarshi-Saha-1996/Noise-Abduction-for-Counterfactuals. |
| Open Datasets | Yes | We report experimental results on both synthetic and real-world German Credit Dataset, showcasing the promise and usefulness of the proposed exogenous noise identification. Dheeru Dua and Casey Graff. UCI machine learning repository, 2017. URL http://archive.ics.uci.edu/ ml. |
| Dataset Splits | No | The paper mentions generating 20000 data points for the synthetic dataset and distinguishes between 'seen' (training and validation) and 'unseen' (test) data for evaluation, stating: "By seen datapoints , we mean these are the datapoints used in training and validation. MSE in estimation of counterfactuals on unseen data points( test MSE) is given in Appendix C". However, it does not provide specific percentages or absolute counts for how the data was split into training, validation, and test sets for either the synthetic or German Credit datasets, nor does it refer to predefined splits with citations or detailed splitting methodologies. |
| Hardware Specification | Yes | Both models are trained for 1000 epochs using 12th Gen Intel(R) Core(TM) i9-12900KF CPU. All instances of both models are trained for 500 epochs using NVIDIA RTX A5000 GPU. |
| Software Dependencies | No | We use the Pyro (Bingham et al., 2019) probabilistic programming language (PPL) framework for the implementation of the flow-based SCM. Pyro is a PPL based on Py Torch (Paszke et al., 2019). Adam (Kingma & Ba, 2015) with batch-size 128, an initial learning rate of 10 3 is used for optimization purposes. Specific version numbers for Pyro, PyTorch, or Adam are not provided. |
| Experiment Setup | Yes | Adam (Kingma & Ba, 2015) with batch-size 128, an initial learning rate of 10 3 is used for optimization purposes. Both models are trained for 1000 epochs using 12th Gen Intel(R) Core(TM) i9-12900KF CPU. Adam (Kingma & Ba, 2015) with a batch-size of 64, an initial learning rate of 3 10 4, and weight decay of 10 4 are used in training. We use a staircase learning rate schedule with decay milestones at 50% and 75% of the training duration. All instances of both models are trained for 500 epochs using NVIDIA RTX A5000 GPU. |