OLMD: Orientation-aware Long-term Motion Decoupling for Continuous Sign Language Recognition
Authors: Yiheng Yu, Sheng Liu, Yuan Feng, Min Xu, Zhelun Jin, Xuhua Yang
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimentally, OLMD shows SOTA performance on three large-scale datasets: PHOENIX14, PHOENIX14-T, and CSL-Daily. Notably, we improve the word error rate (WER) on PHOENIX14 by an absolute 1.6% compared to the previous SOTA. Extensive experiments show our proposed OLMD outperforms all previous models on the three widely-used datasets: PHOENIX14 (Forster et al. 2015), PHOENIX14T (Camgoz et al. 2018), and CSL-Daily (Zhou et al. 2021). Fig. 1b highlights the excellent performance of OLMD on PHOENIX14. |
| Researcher Affiliation | Academia | Zhejiang University of Technology EMAIL |
| Pseudocode | No | The paper describes the methodology using mathematical equations and block diagrams (e.g., Figure 2 and Figure 3), but it does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain an explicit statement about releasing source code or provide a link to a code repository for the methodology described. |
| Open Datasets | Yes | In this paper, we mainly use three large CSLR datasets: PHOENIX14 (Forster et al. 2015) is a popular CSLR dataset from German TV weather reports with 1,295 words, recorded by 9 signers. PHOENIX14-T (Camgoz et al. 2018) is an expanded version of PHOENIX-2014, offering 7,096 training, 519 development (Dev), and 642 testing (Test) videos. CSL-Daily (Zhou et al. 2021) is a large-scale Chinese sign language dataset filmed by 10 different signers, covering a variety of daily life themes with over 20,000 sentences. |
| Dataset Splits | Yes | PHOENIX14 ... It includes 5,672 training, 540 development (Dev), and 629 testing (Test) videos. PHOENIX14-T ... offering 7,096 training, 519 development (Dev), and 642 testing (Test) videos. CSL-Daily ... The dataset is split into 18,401 training samples, 1,077 development samples, and 1,176 test samples, featuring 2,000 sign language and 2,343 Chinese text vocabularies. |
| Hardware Specification | Yes | Finally, all the training and testing are completed on 1 NVIDIA A6000 GPU. |
| Software Dependencies | No | The paper mentions models like ResNet34, 1D-CNNs, Bi LSTM, and optimizers like Adam, but does not provide specific version numbers for software libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages used for implementation. |
| Experiment Setup | Yes | During training, we set the batch size to 2 and the initial learning rate to 0.001, reducing to 30% at epochs 25 and 40. We default to using the Adam optimizer with a weight decay of 0.001, iterating for a total of 70 epochs. All input frames are first resized to 256x256 and then randomly cropped to 224x224 during training, with a 50% chance of horizontal flipping and a 20% probability of temporal scale adjustment. For inference, we simply use a central crop of 224x224. |