SCOPE: Sign Language Contextual Processing with Embedding from LLMs
Authors: Yuqi Liu, Wenqian Zhang, Sihan Ren, Chengyu Huang, Jingyi Yu, Lan Xu
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results demonstrate that our SCOPE framework achieves state-of-the-art performance on multiple datasets, including Phoenix-2014T, CSL-Daily, and our SCOPE dataset. Moreover, surveys conducted with participants from the Deaf community further validate the robustness and effectiveness of our approach in real-world applications. We conduct ablation experiments for both SLR and SLT tasks to validate the contributions of each component. |
| Researcher Affiliation | Academia | Yuqi Liu*, Wenqian Zhang*, Sihan Ren, Chengyu Huang, Jingyi Yu, Lan Xu Shanghai Tech University EMAIL |
| Pseudocode | No | The paper describes the methodology using text and mathematical formulas (e.g., equations 2, 3, 4, 5, 6, 7, 8, 9) but does not include any clearly labeled pseudocode blocks or algorithm sections. |
| Open Source Code | Yes | Code and Supplementary Materials https://github.com/Godheritage/SCOPE |
| Open Datasets | Yes | We also contribute a new sign language dataset that contains 72 hours of Chinese sign language videos in contextual dialogues across various scenarios. Our benchmark dataset and baseline approach will be made publicly available. Experimental results demonstrate that our SCOPE framework achieves state-of-the-art performance on multiple datasets, including Phoenix-2014T (Camgoz et al. 2018), CSL-Daily (Zhou et al. 2021a), and our SCOPE dataset. |
| Dataset Splits | Yes | Train/dev/test splits of the existing datasets are maintained. For our SCOPE dataset, we follow (Zhang et al. 2024) to use widely adopted split ratios to randomly split our dataset by 80%, 5% and 15% into train, dev, and test sets, carefully ensuring that no same sentence appears in different sets and any sentence in the dev set or test set does not appear in context dialogues of the training set. |
| Hardware Specification | Yes | All experiments are executed on 8 NVIDIA A800 GPUs. |
| Software Dependencies | No | The paper mentions several tools and models like Open AI’s text-embedding-ada-002, DWPose, and Qwen2 LLM, but it does not specify version numbers for these or any other software libraries or programming languages used (e.g., Python, PyTorch, CUDA versions). |
| Experiment Setup | Yes | The embedding alignment encoder and gloss encoder are both 8-head transformer encoders with 2 and 4 layers, respectively, with hidden size 1568 and feed-forward size 3136. We adopt the Adam W optimizer and use cosine annealing schedules, with 20 epochs focusing on alignment embedding, and 60 epochs for gloss encoder training while keeping the previous module frozen. |