DocVXQA: Context-Aware Visual Explanations for Document Question Answering
Authors: Mohamed Ali Souibgui, Changkyu Choi, Andrey Barsky, Kangsoo Jung, Ernest Valveny, Dimosthenis Karatzas
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments, including human evaluation, provide strong evidence supporting the effectiveness of our method. ... Section 4. Experiments and Results |
| Researcher Affiliation | Academia | 1Computer Vision Center, Universitat Aut onoma de Barcelona, Spain 2Ui T The Arctic University of Norway, Norway 3Inria, France. |
| Pseudocode | No | The paper describes the methodology using text and mathematical formulations (e.g., objective function in Equation 3) and flowcharts (Figure 2), but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code is available at https://github.com/ dali92002/Doc VXQA. |
| Open Datasets | Yes | The experiments are done on two datasets, Doc VQA (Mathew et al., 2021) and PFL-Doc VQA (Tito et al., 2024). |
| Dataset Splits | No | The paper states that experiments are done on Doc VQA and PFL-Doc VQA datasets, and refers to "fine-tuned Pix2Struct to predict the answer" and |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for the experiments. |
| Software Dependencies | No | The paper mentions using Pix2Struct and an ADAMW optimizer but does not specify version numbers for any software libraries, programming languages, or other dependencies. |
| Experiment Setup | Yes | HYPERPARAMETER VALUE LEARNING RATE 1 10 7 BATCH SIZE 5 OPTIMIZER ADAMW γ 0.5 β 5 THRESHOLD (k) FOR POSTPROCESSING 3 |