Robust Image Hashing Based on Contrastive Masked Autoencoder with Weak-Strong Augmentation Alignment

Authors: Cundian Yang, Guibo Luo, Yuesheng Zhu, Jiaqi Li, Xiyao Liu

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results demonstrate that our method exhibits remarkable robustness against various attacks, including challenging ones such as rotation and hybrid attacks, and delivers excellent identification performance with a F1 score close to 1.0.
Researcher Affiliation Academia 1 Guangdong Provincial Key Laboratory of Ultra High Definition Immersive Media Technology, Shenzhen Graduate School, Peking University 2 School of Computer Science and Engineering, Central South University EMAIL, EMAIL, EMAIL
Pseudocode No The paper describes the methodology and framework in text and with a diagram (Figure 1), but it does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code Yes Our code and supplementary materials are available on Github. Github https://github.com/pikeyang/cmaa
Open Datasets Yes We train our autoencoder using the COCO mini-train dataset proposed by (Samet, Hicsonmez, and Akbas 2020), which is a subset of the COCO train2017 dataset. In the experiments, we use three datasets, i.e., CASIA dataset (Dong, Wang, and Tan 2013), Copydays dataset (J egou et al. 2011), and UCID dataset (Schaefer and Stich 2003) to evaluate the performance of different methods.
Dataset Splits No The paper states: 'We train our autoencoder using the COCO mini-train dataset... This subset comprises 25,000 images, accounting for approximately 20% of the train2017 set.' It also mentions using CASIA, Copydays, and UCID datasets for evaluation and 'Nt test images' for identification. However, it does not provide specific training, validation, or test split percentages or counts for any of the datasets used, nor does it refer to standard predefined splits in a reproducible manner.
Hardware Specification No The paper does not provide specific details about the hardware used to run its experiments, such as GPU models, CPU types, or memory specifications. It only vaguely mentions 'computing resources' in relation to fine-tuning.
Software Dependencies No The paper mentions 'ViT-Small' as an encoder backbone, but it does not specify any software dependencies (e.g., programming languages, libraries, or frameworks) with their corresponding version numbers required to reproduce the work.
Experiment Setup Yes Specifically, we randomly mask some patches of x2 i,patch with a default masking rate of m = 30%, resulting in T = (1 m) T patches... t > 0 is a temperature parameter set to t = 0.1 by default. Here K is the size of the Memory Bank (first in first out queue) that stores the feature embedding of other images from previous batches, where K is set to 2048... the momentum encoder (Em) is updated from encoder (E) using exponential moving average (EMA) with a momentum smoothing factor α = 0.999... all models of ablation experiments are trained for 100 epochs on COCO mini-train dataset.