Mesoscopic Insights: Orchestrating Multi-Scale & Hybrid Architecture for Image Manipulation Localization

Authors: Xuekang Zhu, Xiaochen Ma, Lei Su, Zhuohang Jiang, Bo Du, Xiwen Wang, Zeyu Lei, Wentao Feng, Chi-Man Pun, Ji-Zhe Zhou

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments across four datasets have demonstrated that our models surpass the current state-of-the-art in terms of performance, computational complexity, and robustness.
Researcher Affiliation Academia Xuekang Zhu1,2*, Xiaochen Ma1,2*, Lei Su1,2, Zhuohang Jiang3, Bo Du1,2, Xiwen Wang1,2, Zeyu Lei1,2, Wentao Feng1,2, Chi-Man Pun4, Ji-Zhe Zhou1,2 1College of Computer Science, Sichuan University 2Engineering Research Center of Machine Learning and Industry Intelligence, Ministry of Education of China 3The Hong Kong Polytechnic University 4Computer and Information Science, Faculty of Science and Technology, University of Macau
Pseudocode No The paper describes the Mesorch framework and its components like the DCT Module, Local Feature Module, Global Feature Module, and Adaptive Weighting Module in textual form and through architectural diagrams (Figure 3), but it does not include explicit pseudocode blocks or algorithms.
Open Source Code Yes Code https://github.com/scu-zjz/Mesorch
Open Datasets Yes We conducted our model evaluations using publicly recognized benchmark, IMDLBen Co (Ma et al. 2024), across four widely used datasets: CASIAv1 (Dong, Wang, and Tan 2013), Coverage (Wen et al. 2016), NIST16 (Guan et al. 2019), and Columbia (Hsu and Chang 2006).
Dataset Splits Yes Our model was trained using the standardized Protocol-CAT dataset, provided through a codebase referenced in (Ma et al. 2024). This protocol includes established datasets and typical data augmentation methods.
Hardware Specification Yes We conducted the training over 150 epochs, utilizing a batch size of 12 on four NVIDIA 4090 graphics cards.
Software Dependencies No The paper mentions using the Adam W optimizer and a cosine learning rate schedule, but does not provide specific version numbers for software dependencies such as deep learning frameworks (e.g., PyTorch, TensorFlow) or programming languages (e.g., Python).
Experiment Setup Yes All images were resized to 512x512 pixels. We conducted the training over 150 epochs, utilizing a batch size of 12 on four NVIDIA 4090 graphics cards. The learning rate followed a cosine schedule (Loshchilov and Hutter 2017), starting at 1e-4 and tapering to a minimum of 5e-7, with a warm-up period of 2 epochs to gradually adjust the learning rate. The Adam W optimizer was used with a weight decay of 0.05 to mitigate overfitting. Furthermore, we set the accumulation iteration to 2, effectively adjusting the batch size to enhance the model s generalization across diverse data inputs.