reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

NeurIPS 2023 Competition: Privacy Preserving Federated Learning Document VQA

Authors: Marlon Tobaben, Mohamed Ali Souibgui, Rubèn Tito, Khanh Nguyen, Raouf Kerkouche, Kangsoo Jung, Joonas Jälkö, Lei Kang, Andrey Barsky, Vincent Poulain d'Andecy, Aurélie JOSEPH, Aashiq Muhamed, Kevin Kuo, Virginia Smith, Yusuke Yamasaki, Takumi Fukami, Kenta Niwa, Iifan Tyou, Hiro Ishii, Rio Yokota, Ragul N, Rintu Kutum, Josep Llados, Ernest Valveny, Antti Honkela, Mario Fritz, Dimosthenis Karatzas

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The Privacy Preserving Federated Learning Document VQA (PFL-Doc VQA) competition challenged the community to develop provably private and communication-efficient solutions in a federated setting for a real-life use case: invoice processing. ... Participants fine-tuned a pre-trained, state-of-the-art Document Visual Question Answering model provided by the organizers for this new domain... The competition analysis provides best practices and recommendations... In the PFL-Doc VQA Competition three main aspects are evaluated: The model s utility, the communication cost during training and the DP privacy budget spent through training the model. ... 4 Track 1: Communication Efficient Federated Learning ... 4.3 Winner Track 1: Muhamed, Kuo, and Smith ... Table 2: Competition Winners Track 1 (Communication efficient federated learning) ... Figure A.4: Convergence curves for the global model using (Left) validation loss, (Center) validation accuracy, and (Right) validation ANLS.
Researcher Affiliation	Collaboration	Competition Organizers 1University of Helsinki, 2Computer Vision Center, Universitat Autònoma de Barcelona, 3CISPA Helmholtz Center for Information Security, 4INRIA, 5Yooz Winning Competition Participants 6Carnegie Mellon University, 7NTT, 8Institute of Science Tokyo, 9a Department of Computer Science, and Mphasis AI & Applied Tech Lab at Ashoka, Ashoka University
Pseudocode	Yes	The update rules of our method, named Fed Shampoo, are outlined in Algorithm 1 in Appendix C. ... We propose DP-CLGECL, which introduced Adam W as a local update, client sampling, and Gaussian mechanism in DP for CLGECL, as summarized in Algorithm 2.
Open Source Code	Yes	The starter kit is openly available: https://github.com/rubenpt91/PFL-Doc VQA-Competition. ... Our code is shared on Github: https://github.com/imkevinkuo/PFL-Doc VQA-Competition. ... The Git Hub repository of our solution can be accessed at https://github.com/Kutum Lab/pfl-docvqa-with-Lo RA.
Open Datasets	Yes	For this competition, we used the PFL-Doc VQA dataset (Tito et al., 2024), the first dataset for private federated Doc VQA. The dataset is created using invoice document images gathered from the Doc ILE dataset (Šimsa et al., 2023). ... The data set is described in more detail in Tito et al. (2024) and is available to download. The Dataset is based on images from the Doc ILE dataset (Šimsa et al., 2023)...
Dataset Splits	Yes	Following this, the base data used in this competition consists of a training set divided among N clients (we use N = 10), a validation set and a test set. (See Figure A.1). The training set of each of the N clients contains invoices sampled from a different subset of providers... Table A1: Statistics on the base PFL-Doc VQA Dataset in terms of number of Providers/Documents/Pages/Question-Answers.
Hardware Specification	Yes	In our code, we train one model at a time using data parallelism. Specifically, we split each batch over 8 GPUs, resulting in a batch size of 2 per GPU (we used 8 Ge Force GTX 1080 Ti GPUs). ... Computing environment: We used a server with 8 GPUs (NVIDIA A6000 for NVLink 40Gi B HBM2) and 2 CPUs (Xeon). ... We utilize two NVIDIA A40 (40 GB VRAM each) and train for some hours to obtain the baselines.
Software Dependencies	No	The code itself is based on established libraries such as Py Torch (Paszke et al., 2019) and the FL framework Flower (Beutel et al., 2020).
Experiment Setup	Yes	In all experiments, clients perform local fine-tuning with batch size = 16 and learning rate = 2e-4. ... To ensure a fair comparison of the two methods, several hyperparameters (learning rate η and element-wise clipping threshold C) were empirically tuned. This was done while maintaining fixed values for the total communication rounds R = 10, the number of inner loops for local update L = 5000, and the number of client sampling K = 2. In Figure A.3, a summary of our hyperparameter tuning for Fed Shampoo is provided. After performing empirical trials, we selected η = 2e 4 and C = 0.2. ... The updates are clipped to a norm of 0.5 and the Gaussian noise is computed so that the privacy budgets of ϵ {1, 4, 8} at δ = 10 5 is spent at the end of training.