reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

SCOPE: Sign Language Contextual Processing with Embedding from LLMs

Authors: Yuqi Liu, Wenqian Zhang, Sihan Ren, Chengyu Huang, Jingyi Yu, Lan Xu

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results demonstrate that our SCOPE framework achieves state-of-the-art performance on multiple datasets, including Phoenix-2014T, CSL-Daily, and our SCOPE dataset. Moreover, surveys conducted with participants from the Deaf community further validate the robustness and effectiveness of our approach in real-world applications. We conduct ablation experiments for both SLR and SLT tasks to validate the contributions of each component.
Researcher Affiliation	Academia	Yuqi Liu, Wenqian Zhang, Sihan Ren, Chengyu Huang, Jingyi Yu, Lan Xu Shanghai Tech University EMAIL
Pseudocode	No	The paper describes the methodology using text and mathematical formulas (e.g., equations 2, 3, 4, 5, 6, 7, 8, 9) but does not include any clearly labeled pseudocode blocks or algorithm sections.
Open Source Code	Yes	Code and Supplementary Materials https://github.com/Godheritage/SCOPE
Open Datasets	Yes	We also contribute a new sign language dataset that contains 72 hours of Chinese sign language videos in contextual dialogues across various scenarios. Our benchmark dataset and baseline approach will be made publicly available. Experimental results demonstrate that our SCOPE framework achieves state-of-the-art performance on multiple datasets, including Phoenix-2014T (Camgoz et al. 2018), CSL-Daily (Zhou et al. 2021a), and our SCOPE dataset.
Dataset Splits	Yes	Train/dev/test splits of the existing datasets are maintained. For our SCOPE dataset, we follow (Zhang et al. 2024) to use widely adopted split ratios to randomly split our dataset by 80%, 5% and 15% into train, dev, and test sets, carefully ensuring that no same sentence appears in different sets and any sentence in the dev set or test set does not appear in context dialogues of the training set.
Hardware Specification	Yes	All experiments are executed on 8 NVIDIA A800 GPUs.
Software Dependencies	No	The paper mentions several tools and models like Open AI’s text-embedding-ada-002, DWPose, and Qwen2 LLM, but it does not specify version numbers for these or any other software libraries or programming languages used (e.g., Python, PyTorch, CUDA versions).
Experiment Setup	Yes	The embedding alignment encoder and gloss encoder are both 8-head transformer encoders with 2 and 4 layers, respectively, with hidden size 1568 and feed-forward size 3136. We adopt the Adam W optimizer and use cosine annealing schedules, with 20 epochs focusing on alignment embedding, and 60 epochs for gloss encoder training while keeping the previous module frozen.