Small Language Model Makes an Effective Long Text Extractor
Authors: Yelin Chen, Fanjin Zhang, Jie Tang
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that our method achieves state-of-the-art extraction accuracy on three long NER datasets and is capable of extracting entities from long texts in a GPU-memory-friendly manner. |
| Researcher Affiliation | Academia | Yelin Chen1* , Fanjin Zhang2* , Jie Tang2 1School of Computer Science and Technology, Xinjiang University, Urumqi 830049, China 2Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China EMAIL, EMAIL |
| Pseudocode | No | The paper describes the methodology using prose and mathematical equations but does not contain a dedicated section or figure labeled 'Pseudocode' or 'Algorithm'. |
| Open Source Code | Yes | Code https://github.com/THUDM/scholarprofiling/tree/main/sener |
| Open Datasets | Yes | We conduct experiments on three NER datasets: Scholar XL (Zhang et al. 2024), Sci REX (Jain et al. 2020), and Profiling-07 (Tang, Zhang, and Yao 2007; Tang et al. 2008). |
| Dataset Splits | No | Hyper-parameters are selected based on the F1 score on the validation set. |
| Hardware Specification | Yes | All experiments are conducted on an 8-card 80G Nvidia A100 server. |
| Software Dependencies | No | We choose De BERTa-V3-large (He, Gao, and Chen 2023) as the PLM for span-based methods and Diffusion NER. We use Adam W (Loshchilov, Hutter et al. 2017) optimizer with a weight decay of 1e 2. |
| Experiment Setup | Yes | We use Adam W (Loshchilov, Hutter et al. 2017) optimizer with a weight decay of 1e 2. The unilateral window sizes of the arrow attention and Bi SPA mechanism are both set to 128. We only use low-rank adaptation on the Q and V matrix of the self-attention mechanism with a rank of 8. |