Multiple Instance Verification

Authors: Xin Xu, Eibe Frank, Geoffrey Holmes

JMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through empirical studies on three different verification tasks, we demonstrate that CAP outperforms adaptations of SOTA MIL methods and the baseline by substantial margins, in terms of both classification accuracy and the ability to detect key instances. The superior ability to identify key instances is attributed to the new attention functions by ablation studies.
Researcher Affiliation Academia Xin Xu EMAIL Eibe Frank EMAIL Geoffrey Holmes EMAIL Department of Computer Science University of Waikato Hamilton, New Zealand
Pseudocode No The paper describes the model architectures and components using mathematical equations and textual descriptions in Section 4.1.1 and 4.1.2, and illustrates them with figures, but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes We share our code at https://github.com/xxweka/MIV.
Open Datasets Yes We collected QMNIST data (BSD-style license, https://github.com/facebookresearch/ qmnist/blob/main/LICENSE) according to Yadav and Bottou (2019) and from links therein. ... We collected raw images of signatures, both authentic and forged, based on Liwicki et al. (2011) and from the link http://www.iapr-tc11.org/mediawiki/index.php/ICDAR_2011_ Signature_Verification_Competition_(Sig Comp2011)... For fact extraction and verification (FEVER), we collected the raw data of claims and evidence as in Thorne et al. (2018) and from the FEVER (2018) website https://fever.ai/ dataset/fever.html
Dataset Splits Yes For QMNIST: By construction, the sample sizes of the train/dev/test datasets are 21,509/2,408/2,253 respectively... For Signature Verification: The sample size for training is around 82,000, and for validation/test is close to 10,000. For FEVER: we used the full set of raw validation/test data (6616/6613 exemplars respectively) as our validation/test datasets. To construct our training dataset in each round of experiments, we randomly sampled 33,000 exemplars from the raw training set...
Hardware Specification Yes All experiments were run on a cluster of four NVIDIA RTX A6000 GPUs, of which the duration varies depending on early stopping triggers.
Software Dependencies Yes All models were developed using Tensorflow 2.9.3, with some use of Tensor Flow official models 2.9.0 (Yu et al. (2020), Apache License 2.0) and scikit-learn 1.2.0 (Pedregosa et al. (2011), BSD License).
Experiment Setup Yes For QMNIST: The learning rate of the RMSprop optimizer was piece-wise constant: 1e-4 for the first 5 epochs, 5e-5 for the next 15 epochs, and 2e-5 for the remaining epochs. The mini-batch size was 768, and the early-stopping criterion was non-improvement of validation accuracy for 30 epochs. The number of heads was two for multi-head attention, when applicable.