Rethinking Visual Counterfactual Explanations Through Region Constraint
Authors: Bartlomiej Sobieski, Jakub Grzywaczewski, Bartłomiej Sadlej, Matthew Tivnan, Przemyslaw Biecek
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through large-scale experiments, we demonstrate that, besides a fully automated way of synthesizing meaningful and highly interpretable RVCEs, our approach, Region-constrained Counterfactual Schr odinger Bridge (RCSB), allows to infer causally about the model s change in prediction and enables the user to actively interact with the explanatory process by manually defining the region of interest. (...) 4 EXPERIMENTS Method FID s FID S3 COUT FR Zebra Sorrel ACE l1 84.5 122.7 0.92 0.45 47.0 ACE l2 67.7 98.4 0.90 0.25 81.0 LDCE-cls 84.2 107.2 0.78 0.06 88.0 LDCE-txt 82.4 107.2 0.71 0.21 81.0 DVCE 33.1 43.9 0.62 0.21 57.8 RCSBC 13.0 20.4 0.82 0.70 99.7 RCSBB 9.51 17.4 0.86 0.72 97.4 RCSBA 8.0 16.2 0.88 0.74 94.7 |
| Researcher Affiliation | Academia | Bartlomiej Sobieski University of Warsaw EMAIL Jakub Grzywaczewski Warsaw University of Technology EMAIL Bartlomiej Sadlej University of Warsaw EMAIL Matthew Tivnan Harvard Medical School EMAIL Przemyslaw Biecek University of Warsaw, Warsaw University of Technology EMAIL |
| Pseudocode | Yes | For the pseudocode of the entire procedure, see Appendix. We include our implementation at https://github.com/sobieskibj/rcsb. (...) A Pseudocode Algorithm 1 Standard I2SB Generation 1: Input: x N p1(x N), trained sψ( , ) 2: for n = N to 1 do 3: Predict ˆx0(xn) using sψ(xn, tn) 4: xn 1 p(xn 1 | ˆx0, xn) according to DDPM 5: end for 6: return x0 Algorithm 2 OT-ODE I2SB Generation 1: Input: x N p1(x N), trained sψ( , ) 2: for n = N to 1 do 3: Predict ˆx0(xt) using sψ(xn, tn) 4: xn 1 = µn 1ˆx0 + µn 1xn 5: end for 6: return x0 Algorithm 3 RCSB 1: Input: Number of steps N, binary region mask R, trajectory truncation τ, classifier scale s, input image x , trained sψ( , ), trained classifier f(y | ), target class y 2: x1 = (1 R) x + R z, where z N(z; 0, I) 3: Discretize truncated timeline 0 = t0 < t1 < < t N = τ 4: x N q(x N|x0, x1) # sample from analytic posterior (Eq. (15)) 5: for n = N to 1 do 6: Predict ˆx0(xn) using sψ(xn, tn) 7: gn = xn log f(y | ˆx0) 8: gn = ADAM(gn) 9: if n == N then g = g N 2 # register norm of the first gradient 10: end if 11: xn = xn + s gn g 12: xn 1 = µn 1ˆx0 + µn 1 xn 13: end for 14: return x0 Algorithm 4 ADAM Update Rule 1: Input: Gradient at step n gn, hyperparameters α, ϵ, β1, β2 (set to Py Torch (Paszke et al., 2019) defaults) 2: mn = β1mn 1 + (1 β1)gn # update biased first moment estimate 3: vn = β2vn 1 + (1 β2)g2 n # update biased second moment estimate 4: ˆmn = mn/(1 βn 1 ) # compute bias-corrected first moment 5: ˆvn = vn/(1 βn 2 ) # compute bias-corrected second moment 6: gn = α ˆmn/( ˆvn + ϵ) # update gradient 7: return gn # return updated gradient |
| Open Source Code | Yes | We include our implementation at https://github.com/sobieskibj/rcsb. |
| Open Datasets | Yes | Specifically, we set a new quantitative state-of-the-art (SOTA) on Image Net (Deng et al., 2009) with up to 4 times better scores in FID and 3 times better s FID (realism)... We extend the evaluation of RCSB with three additional datasets: Celeb A-HQ (Karras et al., 2018) with 30 000 samples of 256 256 resolution face images, Celeb A (Liu et al., 2015) with around 200 000 samples of 128 128 resolution face images, and MNIST (Deng, 2012) with 70 000 samples of 32 32 resolution images of handwritten digits. |
| Dataset Splits | Yes | Following previous works for VCEs on Image Net, we base the quantitative evaluation on 3 challenging main VCE generation tasks: Zebra Sorrel, Cheetah Cougar, Egyptian Cat Persian Cat, where each task requires creating VCEs for images from both classes and flipping the decision to their counterparts. (...) For Res Net50, this results in around 2000 images per task. (...) For MNIST, we train Le Net (Lecun et al., 1998) from scratch using the default training and validation splits. |
| Hardware Specification | Yes | The computational resources were provided by the Laboratory of Bioinformatics and Computational Genomics and the High Performance Computing Center of the Faculty of Mathematics and Information Science, Warsaw University of Technology. (...) Each inpainting algorithm is given a 24 A100 GPU hours time budget |
| Software Dependencies | Yes | Algorithm 4 ADAM Update Rule 1: Input: Gradient at step n gn, hyperparameters α, ϵ, β1, β2 (set to Py Torch (Paszke et al., 2019) defaults) |
| Experiment Setup | Yes | The best results are obtained with A(a = 0.1, c = 4, s = 3, τ = 0.6), but the superiority is clear for various configurations, including B(a = 0.2, c = 4, s = 1.5, τ = 0.6), C(a = 0.3, c = 4, s = 1.5, τ = 0.6). (...) By default, we use NFE=100, which we explored the most, but lower NFE regimes provided promising initial results. |