Rethinking Visual Counterfactual Explanations Through Region Constraint

Authors: Bartlomiej Sobieski, Jakub Grzywaczewski, Bartłomiej Sadlej, Matthew Tivnan, Przemyslaw Biecek

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through large-scale experiments, we demonstrate that, besides a fully automated way of synthesizing meaningful and highly interpretable RVCEs, our approach, Region-constrained Counterfactual Schr odinger Bridge (RCSB), allows to infer causally about the model s change in prediction and enables the user to actively interact with the explanatory process by manually defining the region of interest. (...) 4 EXPERIMENTS Method FID s FID S3 COUT FR Zebra Sorrel ACE l1 84.5 122.7 0.92 0.45 47.0 ACE l2 67.7 98.4 0.90 0.25 81.0 LDCE-cls 84.2 107.2 0.78 0.06 88.0 LDCE-txt 82.4 107.2 0.71 0.21 81.0 DVCE 33.1 43.9 0.62 0.21 57.8 RCSBC 13.0 20.4 0.82 0.70 99.7 RCSBB 9.51 17.4 0.86 0.72 97.4 RCSBA 8.0 16.2 0.88 0.74 94.7
Researcher Affiliation Academia Bartlomiej Sobieski University of Warsaw EMAIL Jakub Grzywaczewski Warsaw University of Technology EMAIL Bartlomiej Sadlej University of Warsaw EMAIL Matthew Tivnan Harvard Medical School EMAIL Przemyslaw Biecek University of Warsaw, Warsaw University of Technology EMAIL
Pseudocode Yes For the pseudocode of the entire procedure, see Appendix. We include our implementation at https://github.com/sobieskibj/rcsb. (...) A Pseudocode Algorithm 1 Standard I2SB Generation 1: Input: x N p1(x N), trained sψ( , ) 2: for n = N to 1 do 3: Predict ˆx0(xn) using sψ(xn, tn) 4: xn 1 p(xn 1 | ˆx0, xn) according to DDPM 5: end for 6: return x0 Algorithm 2 OT-ODE I2SB Generation 1: Input: x N p1(x N), trained sψ( , ) 2: for n = N to 1 do 3: Predict ˆx0(xt) using sψ(xn, tn) 4: xn 1 = µn 1ˆx0 + µn 1xn 5: end for 6: return x0 Algorithm 3 RCSB 1: Input: Number of steps N, binary region mask R, trajectory truncation τ, classifier scale s, input image x , trained sψ( , ), trained classifier f(y | ), target class y 2: x1 = (1 R) x + R z, where z N(z; 0, I) 3: Discretize truncated timeline 0 = t0 < t1 < < t N = τ 4: x N q(x N|x0, x1) # sample from analytic posterior (Eq. (15)) 5: for n = N to 1 do 6: Predict ˆx0(xn) using sψ(xn, tn) 7: gn = xn log f(y | ˆx0) 8: gn = ADAM(gn) 9: if n == N then g = g N 2 # register norm of the first gradient 10: end if 11: xn = xn + s gn g 12: xn 1 = µn 1ˆx0 + µn 1 xn 13: end for 14: return x0 Algorithm 4 ADAM Update Rule 1: Input: Gradient at step n gn, hyperparameters α, ϵ, β1, β2 (set to Py Torch (Paszke et al., 2019) defaults) 2: mn = β1mn 1 + (1 β1)gn # update biased first moment estimate 3: vn = β2vn 1 + (1 β2)g2 n # update biased second moment estimate 4: ˆmn = mn/(1 βn 1 ) # compute bias-corrected first moment 5: ˆvn = vn/(1 βn 2 ) # compute bias-corrected second moment 6: gn = α ˆmn/( ˆvn + ϵ) # update gradient 7: return gn # return updated gradient
Open Source Code Yes We include our implementation at https://github.com/sobieskibj/rcsb.
Open Datasets Yes Specifically, we set a new quantitative state-of-the-art (SOTA) on Image Net (Deng et al., 2009) with up to 4 times better scores in FID and 3 times better s FID (realism)... We extend the evaluation of RCSB with three additional datasets: Celeb A-HQ (Karras et al., 2018) with 30 000 samples of 256 256 resolution face images, Celeb A (Liu et al., 2015) with around 200 000 samples of 128 128 resolution face images, and MNIST (Deng, 2012) with 70 000 samples of 32 32 resolution images of handwritten digits.
Dataset Splits Yes Following previous works for VCEs on Image Net, we base the quantitative evaluation on 3 challenging main VCE generation tasks: Zebra Sorrel, Cheetah Cougar, Egyptian Cat Persian Cat, where each task requires creating VCEs for images from both classes and flipping the decision to their counterparts. (...) For Res Net50, this results in around 2000 images per task. (...) For MNIST, we train Le Net (Lecun et al., 1998) from scratch using the default training and validation splits.
Hardware Specification Yes The computational resources were provided by the Laboratory of Bioinformatics and Computational Genomics and the High Performance Computing Center of the Faculty of Mathematics and Information Science, Warsaw University of Technology. (...) Each inpainting algorithm is given a 24 A100 GPU hours time budget
Software Dependencies Yes Algorithm 4 ADAM Update Rule 1: Input: Gradient at step n gn, hyperparameters α, ϵ, β1, β2 (set to Py Torch (Paszke et al., 2019) defaults)
Experiment Setup Yes The best results are obtained with A(a = 0.1, c = 4, s = 3, τ = 0.6), but the superiority is clear for various configurations, including B(a = 0.2, c = 4, s = 1.5, τ = 0.6), C(a = 0.3, c = 4, s = 1.5, τ = 0.6). (...) By default, we use NFE=100, which we explored the most, but lower NFE regimes provided promising initial results.