Less is More: Fewer Interpretable Region via Submodular Subset Selection

Authors: Ruoyu Chen, Hua Zhang, Siyuan Liang, Jingzhi Li, Xiaochun Cao

ICLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments show that the proposed method outperforms SOTA methods on two face datasets (Celeb-A and VGG-Face2) and one fine-grained dataset (CUB-200-2011).
Researcher Affiliation Academia Ruoyu Chen1,2, Hua Zhang1,2, , Siyuan Liang3, Jingzhi Li1,2, Xiaochun Cao4 1Institute of Information Engineering, Chinese Academy of Sciences, Beijing 100093, China 2School of Cyber Security, University of Chinese Academy of Sciences, Beijing 100049, China EMAIL 3School of Computing, National University of Singapore, 119077, Singapore EMAIL 4School of Cyber Science and Technology, Shenzhen Campus of Sun Yat-sen University, Shenzhen 518107, China EMAIL
Pseudocode Yes Algorithm 1: A greedy search based algorithm for interpretable region discovery
Open Source Code Yes The code is released at https://github.com/Ruoyu Chen10/SMDL-Attribution.
Open Datasets Yes We evaluate the proposed method on two face datasets Celeb-A (Liu et al., 2015) and VGG-Face2 (Cao et al., 2018), and a fine-grained dataset CUB-200-2011 (Welinder et al., 2010).
Dataset Splits Yes Celeb-A dataset includes 10, 177 IDs, we randomly select 2, 000 identities from Celeb-A s validation set... the VGG-Face2 dataset includes 8, 631 IDs, we randomly select 2, 000 identities from VGG-Face2 s validation set... CUB-200-2011 dataset... we select 3 samples for each class that is correctly predicted by the model from the CUB-200-2011 validation set for 200 classes...
Hardware Specification Yes These experiments were performed on an NVIDIA 3090 GPU.
Software Dependencies No The paper mentions using 'Xplique' but does not provide a specific version number. No other software dependencies with version numbers are listed.
Experiment Setup Yes For the two face datasets, we set N = 28 and m = 98. For the CUB-200-2011 dataset, we set N = 10 and m = 25. For the face datasets, we evaluated recognition models that were trained using the Res Net-101 (He et al., 2016) architecture and the Arc Face (Deng et al., 2019) loss function, with an input size of 112 112 pixels. For the CUB-200-2011 dataset, we evaluated a recognition model trained on the Res Net-101 architecture with a cross-entropy loss function and an input size of 224 224 pixels. To simplify parameter adjustment, all weighting coefficients are set to 1 by default.