Nuance Matters: Probing Epistemic Consistency in Causal Reasoning

Authors: Shaobo Cui, Junyou Li, Luca Mouchel, Yiyang Feng, Boi Faltings

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through extensive empirical studies on 21 high-profile LLMs, including GPT4, Claude3, and LLa MA3-70B, we have favoring evidence that current models struggle to maintain epistemic consistency in identifying the polarity and intensity of intermediates in causal reasoning.
Researcher Affiliation Academia Shaobo Cui1, Junyou Li2, Luca Mouchel1, Yiyang Feng1, Boi Faltings1 1EPFL, Switzerland 2University of Waterloo, Canada shaobo.cui@epfl.ch, EMAIL, luca.mouchel@epfl.ch, yiyang.feng@epfl.ch, boi.faltings@epfl.ch
Pseudocode No The paper describes methods and metrics using mathematical formulations and textual descriptions but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes Code https://github.com/cui-shaobo/causal-consistency
Open Datasets Yes To ensure the defeasibility of causal pairs, allowing models to generate intermediates with varying polarity and intensity, we utilize the test dataset of ε-CAUSAL (Cui et al. 2024b) as our foundational dataset, which comprises 1,970 defeasible cause-effect pairs.
Dataset Splits Yes We utilize the test dataset of ε-CAUSAL (Cui et al. 2024b) as our foundational dataset, which comprises 1,970 defeasible cause-effect pairs.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU models, CPU types) used for running the experiments. It only generally acknowledges IT and financial support.
Software Dependencies No The paper mentions several software libraries in its references (e.g., NumPy, Matplotlib, PyTorch, Transformers, NLTK, Accelerate) but does not provide specific version numbers for these or any other ancillary software dependencies used in their experimental setup.
Experiment Setup Yes The prompt for generating these fine-grained intermediates is presented in Figure 6; (ii) Intermediate ranking: From these generated intermediates, we use the same LLM to rank the intermediates to identify their polarities (supporting or defeating) and intensity. The prompt for ranking these fine-grained intermediates is presented in Figure 7;