Small Language Model Makes an Effective Long Text Extractor

Authors: Yelin Chen, Fanjin Zhang, Jie Tang

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate that our method achieves state-of-the-art extraction accuracy on three long NER datasets and is capable of extracting entities from long texts in a GPU-memory-friendly manner.
Researcher Affiliation Academia Yelin Chen1* , Fanjin Zhang2* , Jie Tang2 1School of Computer Science and Technology, Xinjiang University, Urumqi 830049, China 2Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China EMAIL, EMAIL
Pseudocode No The paper describes the methodology using prose and mathematical equations but does not contain a dedicated section or figure labeled 'Pseudocode' or 'Algorithm'.
Open Source Code Yes Code https://github.com/THUDM/scholarprofiling/tree/main/sener
Open Datasets Yes We conduct experiments on three NER datasets: Scholar XL (Zhang et al. 2024), Sci REX (Jain et al. 2020), and Profiling-07 (Tang, Zhang, and Yao 2007; Tang et al. 2008).
Dataset Splits No Hyper-parameters are selected based on the F1 score on the validation set.
Hardware Specification Yes All experiments are conducted on an 8-card 80G Nvidia A100 server.
Software Dependencies No We choose De BERTa-V3-large (He, Gao, and Chen 2023) as the PLM for span-based methods and Diffusion NER. We use Adam W (Loshchilov, Hutter et al. 2017) optimizer with a weight decay of 1e 2.
Experiment Setup Yes We use Adam W (Loshchilov, Hutter et al. 2017) optimizer with a weight decay of 1e 2. The unilateral window sizes of the arrow attention and Bi SPA mechanism are both set to 128. We only use low-rank adaptation on the Q and V matrix of the self-attention mechanism with a rank of 8.