Gradient Inversion of Multimodal Models
Authors: Omri Ben Hemo, Alon Zolfi, Oryan Yehezkel, Omer Hofman, Roman Vainshtein, Hisashi Kojima, Yuval Elovici, Asaf Shabtai
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through extensive evaluation on state-of-the-art DQA models, our approach exposes critical privacy vulnerabilities and highlights the urgent need for robust defenses to secure multimodal FL systems. ... To evaluate the effectiveness of our proposed method, we conduct extensive experiments on state-of-the-art DQA models, including both OCR-based and OCR-free architectures. |
| Researcher Affiliation | Collaboration | 1Ben Gurion University of the Negev, Israel 2Fujitsu Research of Europe 3Fujitsu Limited. Correspondence to: Omri Ben Hemo / Alon Zolfi <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Safe Template Input: Model fθ, visual document x D, question x Q, answer yans, perturbation budget ϵ, norm p, step size α Output: Perturbed Template |
| Open Source Code | No | Project page at: https://Alon Zolfi.github.io/GI-DQA/. The provided URL is for a project page, which typically serves as a demonstration or overview, rather than a direct link to a source-code repository or an explicit statement of code release. |
| Open Datasets | Yes | We use the PFL-Doc VQA (Tito et al., 2024) dataset, designed to perform Doc VQA in a FL environment, with the aim of exposing privacy leakage issues in a realistic scenario. |
| Dataset Splits | No | We use the PFL-Doc VQA (Tito et al., 2024) dataset... For our experiments, we use a subset of the original dataset containing 395 documents. The subset includes 90 templates, each with approximately five distinct documents (the sensitive data differs between the documents of the same template). The paper mentions the total number of documents and templates used but does not specify how these are split into training, validation, or test sets. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware (e.g., GPU models, CPU types) used for running its experiments. |
| Software Dependencies | No | The optimized pixels in the reconstructed document image are randomly initialized and updated using the Adam optimizer with an initial learning rate of 2.0, applying exponential decay with a rate of λ = 0.999 over 5,000 iterations. While the Adam optimizer is mentioned, no specific version numbers for software libraries or programming languages (e.g., PyTorch, TensorFlow, Python) are provided. |
| Experiment Setup | Yes | The optimized pixels in the reconstructed document image are randomly initialized and updated using the Adam optimizer with an initial learning rate of 2.0, applying exponential decay with a rate of λ = 0.999 over 5,000 iterations. The auxiliary loss terms (Equation 4) are weighted using the coefficients αtxt = 0.1, αgau = 0.01, and αTV = 0.1. These values were selected using the grid search approach over the values {0, 0.001, 0.01, 0.1, 1}, optimizing for PSNR performance. |