NeurIPS 2023 Competition: Privacy Preserving Federated Learning Document VQA
Authors: Marlon Tobaben, Mohamed Ali Souibgui, Rubèn Tito, Khanh Nguyen, Raouf Kerkouche, Kangsoo Jung, Joonas Jälkö, Lei Kang, Andrey Barsky, Vincent Poulain d'Andecy, Aurélie JOSEPH, Aashiq Muhamed, Kevin Kuo, Virginia Smith, Yusuke Yamasaki, Takumi Fukami, Kenta Niwa, Iifan Tyou, Hiro Ishii, Rio Yokota, Ragul N, Rintu Kutum, Josep Llados, Ernest Valveny, Antti Honkela, Mario Fritz, Dimosthenis Karatzas
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The Privacy Preserving Federated Learning Document VQA (PFL-Doc VQA) competition challenged the community to develop provably private and communication-efficient solutions in a federated setting for a real-life use case: invoice processing. ... Participants fine-tuned a pre-trained, state-of-the-art Document Visual Question Answering model provided by the organizers for this new domain... The competition analysis provides best practices and recommendations... In the PFL-Doc VQA Competition three main aspects are evaluated: The model s utility, the communication cost during training and the DP privacy budget spent through training the model. ... 4 Track 1: Communication Efficient Federated Learning ... 4.3 Winner Track 1: Muhamed, Kuo, and Smith ... Table 2: Competition Winners Track 1 (Communication efficient federated learning) ... Figure A.4: Convergence curves for the global model using (Left) validation loss, (Center) validation accuracy, and (Right) validation ANLS. |
| Researcher Affiliation | Collaboration | Competition Organizers 1University of Helsinki, 2Computer Vision Center, Universitat Autònoma de Barcelona, 3CISPA Helmholtz Center for Information Security, 4INRIA, 5Yooz Winning Competition Participants 6Carnegie Mellon University, 7NTT, 8Institute of Science Tokyo, 9a Department of Computer Science, and Mphasis AI & Applied Tech Lab at Ashoka, Ashoka University |
| Pseudocode | Yes | The update rules of our method, named Fed Shampoo, are outlined in Algorithm 1 in Appendix C. ... We propose DP-CLGECL, which introduced Adam W as a local update, client sampling, and Gaussian mechanism in DP for CLGECL, as summarized in Algorithm 2. |
| Open Source Code | Yes | The starter kit is openly available: https://github.com/rubenpt91/PFL-Doc VQA-Competition. ... Our code is shared on Github: https://github.com/imkevinkuo/PFL-Doc VQA-Competition. ... The Git Hub repository of our solution can be accessed at https://github.com/Kutum Lab/pfl-docvqa-with-Lo RA. |
| Open Datasets | Yes | For this competition, we used the PFL-Doc VQA dataset (Tito et al., 2024), the first dataset for private federated Doc VQA. The dataset is created using invoice document images gathered from the Doc ILE dataset (Šimsa et al., 2023). ... The data set is described in more detail in Tito et al. (2024) and is available to download. The Dataset is based on images from the Doc ILE dataset (Šimsa et al., 2023)... |
| Dataset Splits | Yes | Following this, the base data used in this competition consists of a training set divided among N clients (we use N = 10), a validation set and a test set. (See Figure A.1). The training set of each of the N clients contains invoices sampled from a different subset of providers... Table A1: Statistics on the base PFL-Doc VQA Dataset in terms of number of Providers/Documents/Pages/Question-Answers. |
| Hardware Specification | Yes | In our code, we train one model at a time using data parallelism. Specifically, we split each batch over 8 GPUs, resulting in a batch size of 2 per GPU (we used 8 Ge Force GTX 1080 Ti GPUs). ... Computing environment: We used a server with 8 GPUs (NVIDIA A6000 for NVLink 40Gi B HBM2) and 2 CPUs (Xeon). ... We utilize two NVIDIA A40 (40 GB VRAM each) and train for some hours to obtain the baselines. |
| Software Dependencies | No | The code itself is based on established libraries such as Py Torch (Paszke et al., 2019) and the FL framework Flower (Beutel et al., 2020). |
| Experiment Setup | Yes | In all experiments, clients perform local fine-tuning with batch size = 16 and learning rate = 2e-4. ... To ensure a fair comparison of the two methods, several hyperparameters (learning rate η and element-wise clipping threshold C) were empirically tuned. This was done while maintaining fixed values for the total communication rounds R = 10, the number of inner loops for local update L = 5000, and the number of client sampling K = 2. In Figure A.3, a summary of our hyperparameter tuning for Fed Shampoo is provided. After performing empirical trials, we selected η = 2e 4 and C = 0.2. ... The updates are clipped to a norm of 0.5 and the Gaussian noise is computed so that the privacy budgets of ϵ {1, 4, 8} at δ = 10 5 is spent at the end of training. |