Causal Concept Graph Models: Beyond Causal Opacity in Deep Learning
Authors: Gabriele Dominici, Pietro Barbiero, Mateo Espinosa Zarlenga, Alberto Termine, Martin Gjoreski, Giuseppe Marra, Marc Langheinrich
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments show that Causal CGMs can: (i) match the generalisation performance of causally opaque models, (ii) enable human-in-the-loop corrections to mispredicted intermediate reasoning steps, boosting not just downstream accuracy after corrections but also the reliability of the explanations provided for specific instances, and (iii) support the analysis of interventional and counterfactual scenarios, thereby improving the model s causal interpretability and supporting the effective verification of its reliability and fairness. |
| Researcher Affiliation | Collaboration | Gabriele Dominici Università della Svizzera italiana EMAIL Pietro Barbiero IBM Research EMAIL Mateo Espinosa Zarlenga University of Cambridge EMAIL Alberto Termine IDSIA EMAIL Martin Gjoreski Università della Svizzera italiana EMAIL Giuseppe Marra KU Leuven EMAIL Marc Langheinrich Università della Svizzera italiana EMAIL |
| Pseudocode | No | The paper describes methods and processes in paragraph form and through equations but does not contain any explicitly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | Yes | The code related to this paper is publicly available 2. https://github.com/gabriele-dominici/Causal CGM |
| Open Datasets | Yes | To answer these questions, we use four datasets: (i) Checkmark, a synthetic dataset composed of four endogenous variables; (ii) d Sprites, where endogenous variables correspond to object types together with their position, colour, and shape; (iii) Celeb A, a facial recognition dataset where endogenous variables represent facial attributes; (iv) CIFAR10, an animal classification dataset where the endogenous variables are extracted automatically following Oikarinen et al. (2023). ... d Sprites (Matthey et al., 2017) ... Celeb A (Liu et al., 2015) ... CIFAR-10 (Krizhevsky, 2009) |
| Dataset Splits | No | The paper mentions using a 'validation set' to determine the optimal epoch for training, but it does not specify the exact percentages or counts for training, validation, or test splits. For example, in Section G.4: 'The optimal epoch for each was determined based on label accuracy on the validation set.' |
| Hardware Specification | Yes | All the experiments except the CIFAR10 ones were performed on a device equipped with an M3 Max and 36GB of RAM, without the use of a GPU. The CIFAR10 experimentswere conducted on a workstation equipped with an NVIDIA RTX A6000 GPU, two AMD EPYC 7513 32-Core processors, and 512 GB of RAM. |
| Software Dependencies | Yes | For our experiments, we implement all baselines and methods in Python 3.9 and relied upon open-source libraries such as Py Torch 2.0 (Paszke et al., 2019) (BSD license), Pytorch Lightning v2.1.2 (Apache Licence 2.0), Sklearn 1.2 (Pedregosa et al., 2011) (BSD license). In addition, we used Matplotlib (Hunter, 2007) 3.7 (BSD license) to produce the plots shown in this paper. |
| Experiment Setup | Yes | Hyperparameters All baseline and proposed models were trained for varying epochs across different datasets: 500 for Checkmark, 200 for d Sprites, 30 for Celeb A and 25 CIFAR10. The optimal epoch for each was determined based on label accuracy on the validation set. A uniform learning rate of 0.01 was applied across all models and datasets. For the CBM and CEM models, both concept and task losses were equally weighted at 1. This weighting scheme was also applied to the loss terms for endogenous copies prediction, endogenous variables prediction (λ1), and graph priors (λ2). The weight assigned to the loss terms in our models to maximise Ca CE is 0.05. Additionally, γ was treated as a learnable parameter, initialised at 0.1, and β was set to 1. All experiments were conducted using five different seeds (1, 2, 3, 4, 5). |