Multiple Instance Verification
Authors: Xin Xu, Eibe Frank, Geoffrey Holmes
JMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through empirical studies on three different verification tasks, we demonstrate that CAP outperforms adaptations of SOTA MIL methods and the baseline by substantial margins, in terms of both classification accuracy and the ability to detect key instances. The superior ability to identify key instances is attributed to the new attention functions by ablation studies. |
| Researcher Affiliation | Academia | Xin Xu EMAIL Eibe Frank EMAIL Geoffrey Holmes EMAIL Department of Computer Science University of Waikato Hamilton, New Zealand |
| Pseudocode | No | The paper describes the model architectures and components using mathematical equations and textual descriptions in Section 4.1.1 and 4.1.2, and illustrates them with figures, but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | We share our code at https://github.com/xxweka/MIV. |
| Open Datasets | Yes | We collected QMNIST data (BSD-style license, https://github.com/facebookresearch/ qmnist/blob/main/LICENSE) according to Yadav and Bottou (2019) and from links therein. ... We collected raw images of signatures, both authentic and forged, based on Liwicki et al. (2011) and from the link http://www.iapr-tc11.org/mediawiki/index.php/ICDAR_2011_ Signature_Verification_Competition_(Sig Comp2011)... For fact extraction and verification (FEVER), we collected the raw data of claims and evidence as in Thorne et al. (2018) and from the FEVER (2018) website https://fever.ai/ dataset/fever.html |
| Dataset Splits | Yes | For QMNIST: By construction, the sample sizes of the train/dev/test datasets are 21,509/2,408/2,253 respectively... For Signature Verification: The sample size for training is around 82,000, and for validation/test is close to 10,000. For FEVER: we used the full set of raw validation/test data (6616/6613 exemplars respectively) as our validation/test datasets. To construct our training dataset in each round of experiments, we randomly sampled 33,000 exemplars from the raw training set... |
| Hardware Specification | Yes | All experiments were run on a cluster of four NVIDIA RTX A6000 GPUs, of which the duration varies depending on early stopping triggers. |
| Software Dependencies | Yes | All models were developed using Tensorflow 2.9.3, with some use of Tensor Flow official models 2.9.0 (Yu et al. (2020), Apache License 2.0) and scikit-learn 1.2.0 (Pedregosa et al. (2011), BSD License). |
| Experiment Setup | Yes | For QMNIST: The learning rate of the RMSprop optimizer was piece-wise constant: 1e-4 for the first 5 epochs, 5e-5 for the next 15 epochs, and 2e-5 for the remaining epochs. The mini-batch size was 768, and the early-stopping criterion was non-improvement of validation accuracy for 30 epochs. The number of heads was two for multi-head attention, when applicable. |