QPM: Discrete Optimization for Globally Interpretable Image Classification

Authors: Thomas Norrenbrock, Timo Kaiser, Sovan Biswas, Ramesh Manuvinakurike, Bodo Rosenhahn

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive evaluations show that QPM delivers unprecedented global interpretability across small and large-scale datasets while setting the state of the art for the accuracy of interpretable models. ... The usual metrics for compactness-based globally interpretable models are shown in table 2. ... The results for the interpretability metrics are shown in table 3. ... This section validates the impact of the individual objectives in the quadratic problem in table 4 and presents the compactness trade-off in fig. 7. We focus on CUB-2011 but observed similar results for other datasets.
Researcher Affiliation Collaboration Thomas Norrenbrock, Timo Kaiser & Bodo Rosenhahn Institute for Information Processing (tnt) L3S Leibniz Universität Hannover, Germany EMAIL Sovan Biswas & Ramesh Manuvinakurike Intel Labs, USA EMAIL
Pseudocode No The paper presents mathematical formulations for the Quadratic Problem (equations 1-7) and details its standard form in Appendix O, which defines variables, objective functions, and constraints. However, it does not include a figure, block, or section explicitly labeled 'Pseudocode' or 'Algorithm', nor does it provide structured, step-by-step procedures formatted like code, but rather a mathematical optimization problem definition.
Open Source Code Yes Code: https://github.com/Thomas Norr/QPM
Open Datasets Yes Following prototype-based methods we applied our method to CUB-2011 (Wah et al., 2011) and Stanford Cars (Krause et al., 2013). ... we also include results on the large-scale dataset Image Net-1K (Russakovsky et al., 2015).
Dataset Splits Yes An overview of the used datasets is shown in Suppl. table 5. (Table 5 shows # Training and # Testing counts for CUB-2011, Stanford Cars, Traveling Birds, Image Net-1K).
Hardware Specification Yes In our experiments, the global optimum for the relaxed problem is usually found in less than 4 hours for fine-grained datasets, and roughly 11 hours for Image Net-1K using a CPU like EPYC 72F3.
Software Dependencies No We implement our method using Py Torch (Paszke et al., 2019) and its Image Net-1K pretrained models as feature extractors. ... This section presents details on how the described quadratic problem with eq. (7) as objective is solved using Gurobi (Gurobi Optimization, LLC, 2023). While these specify software tools used, they do not provide specific version numbers for PyTorch or Gurobi, which is required for a reproducible description of ancillary software.
Experiment Setup Yes The remaining parameters, including dense training for 150 epochs on fine-grained datasets and directly using the pretrained model on Image Net-1K with subsequent 40 epochs of fine-tuning, mirror the SLDD-Model and are described in appendix C. ... We fine-tune the pretrained models on the fine-grained datasets using stochastic gradient descent with a batch size of 16 for 150 epochs. The learning rate starts at 5 x 10^-3 for the pretrained layers and 0.01 for the final linear layer and gets multiplied by 0.4 every 30 epochs. We set momentum to 0.9, â„“2-regularization to 5 x 10^-4 and apply dropout with rate 0.2 to the features. ... After solving the quadratic problem, the model is trained with the final layer fixed to the sparse assignment of selected features W for 40 epochs. The learning rate starts at 100 times the final learning rate of the dense training and decreases by 60 % every 10 epochs. During fine-tuning, momentum is increased to 0.95 and dropout on the features reduced to 10%.