reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Gradient Inversion of Multimodal Models

Authors: Omri Ben Hemo, Alon Zolfi, Oryan Yehezkel, Omer Hofman, Roman Vainshtein, Hisashi Kojima, Yuval Elovici, Asaf Shabtai

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through extensive evaluation on state-of-the-art DQA models, our approach exposes critical privacy vulnerabilities and highlights the urgent need for robust defenses to secure multimodal FL systems. ... To evaluate the effectiveness of our proposed method, we conduct extensive experiments on state-of-the-art DQA models, including both OCR-based and OCR-free architectures.
Researcher Affiliation	Collaboration	1Ben Gurion University of the Negev, Israel 2Fujitsu Research of Europe 3Fujitsu Limited. Correspondence to: Omri Ben Hemo / Alon Zolfi <EMAIL>.
Pseudocode	Yes	Algorithm 1 Safe Template Input: Model fθ, visual document x D, question x Q, answer yans, perturbation budget ϵ, norm p, step size α Output: Perturbed Template
Open Source Code	No	Project page at: https://Alon Zolfi.github.io/GI-DQA/. The provided URL is for a project page, which typically serves as a demonstration or overview, rather than a direct link to a source-code repository or an explicit statement of code release.
Open Datasets	Yes	We use the PFL-Doc VQA (Tito et al., 2024) dataset, designed to perform Doc VQA in a FL environment, with the aim of exposing privacy leakage issues in a realistic scenario.
Dataset Splits	No	We use the PFL-Doc VQA (Tito et al., 2024) dataset... For our experiments, we use a subset of the original dataset containing 395 documents. The subset includes 90 templates, each with approximately five distinct documents (the sensitive data differs between the documents of the same template). The paper mentions the total number of documents and templates used but does not specify how these are split into training, validation, or test sets.
Hardware Specification	No	The paper does not explicitly describe the specific hardware (e.g., GPU models, CPU types) used for running its experiments.
Software Dependencies	No	The optimized pixels in the reconstructed document image are randomly initialized and updated using the Adam optimizer with an initial learning rate of 2.0, applying exponential decay with a rate of λ = 0.999 over 5,000 iterations. While the Adam optimizer is mentioned, no specific version numbers for software libraries or programming languages (e.g., PyTorch, TensorFlow, Python) are provided.
Experiment Setup	Yes	The optimized pixels in the reconstructed document image are randomly initialized and updated using the Adam optimizer with an initial learning rate of 2.0, applying exponential decay with a rate of λ = 0.999 over 5,000 iterations. The auxiliary loss terms (Equation 4) are weighted using the coefficients αtxt = 0.1, αgau = 0.01, and αTV = 0.1. These values were selected using the grid search approach over the values {0, 0.001, 0.01, 0.1, 1}, optimizing for PSNR performance.