SAIL: Sample-Centric In-Context Learning for Document Information Extraction

Authors: Jinyu Zhang, Zhiyuan You, Jize Wang, Xinyi Le

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 4 Experiments 4.1 Datasets, Metrics, and Details 4.2 Results on DIE Benchmarks 4.3 Comparison with Multi-modal LLMs 4.4 Ablation Studies
Researcher Affiliation Academia Jinyu Zhang1*, Zhiyuan You2*, Jize Wang1, Xinyi Le1 1Shanghai Jiao Tong University 2The Chinese University of Hong Kong EMAIL, EMAIL
Pseudocode No The paper includes illustrations of the framework (Figure 2) but does not contain explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes Code https://github.com/sky-goldfish/SAIL
Open Datasets Yes FUNSD (Jaume, Ekenel, and Thiran 2019) is a dataset for understanding the content of tables in scanned documents. ... SROIE (Huang et al. 2019) is another scanned receipt understanding dataset... CORD (Park et al. 2019) is a receipt understanding dataset...
Dataset Splits Yes FUNSD (Jaume, Ekenel, and Thiran 2019) is a dataset for understanding the content of tables in scanned documents. It contains 149 tables and 7,411 entities in the training set, and 50 tables and 2,332 entities in the test set. ... SROIE (Huang et al. 2019) is another scanned receipt understanding dataset, containing 626 receipts in the training set and 347 in the test set. ... CORD (Park et al. 2019) is a receipt understanding dataset that contains 800 training data, 100 test data, and 100 validation data.
Hardware Specification No The paper mentions using specific LLM APIs (GPT-3.5, GPT-4o) and a specific version of an open-source model (chatglm3-6b-32k) but does not provide details on the hardware used to run experiments or host these models/APIs.
Software Dependencies No The paper mentions using Chat GLM3 (chatglm3-6b-32k version), GPT-3.5 (gpt-3.5-turbo API version), GPT-4 (gpt-4o API version) and Sentence-BERT, but does not provide specific version numbers for ancillary software dependencies like programming languages, libraries, or frameworks used for implementation.
Experiment Setup Yes For GPT3.5 and GPT-4o, we set the temperature parameter to 0 to enhance the reproducibility. In our experiments, for each test document, we select four textually similar documents and four layout-similar documents as examples due to the limitation of prompt token number. Furthermore, for each filtered test entity, we choose four textually similar entity examples.