Temporal Action Localization with Cross Layer Task Decoupling and Refinement
Authors: Qiang Li, Di Liu, Jun Kong, Sen Li, Hui Xu, Jianzhong Wang
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct experiments on five challenging datasets including THUMOS14 (Yeung et al. 2018a), Multi THUMOS (Yeung et al. 2018b), EPIC-KITCHENS-100 (Damen et al. 2022) Activity Net-1.3 (Heilbron et al. 2015), and HACS (Zhao et al. 2019) to validate the effectiveness of our method. Evaluation Metric To evaluate the performance of our CLTDR-GMG, we employ the widely adopted m AP metric across various temporal IOU (t Io U) thresholds. |
| Researcher Affiliation | Academia | 1Northeast Normal University 2Northeast Electric Power University 3KLAS of MOE 4Changchun Humanities and Sciences College EMAIL, EMAIL, EMAIL |
| Pseudocode | No | The paper describes the methodology using mathematical equations and descriptive text, but it does not contain a clearly labeled 'Pseudocode' or 'Algorithm' block. |
| Open Source Code | Yes | Code https://github.com/Li Qiang0307/CLTDR-GMG |
| Open Datasets | Yes | We conduct experiments on five challenging datasets including THUMOS14 (Yeung et al. 2018a), Multi THUMOS (Yeung et al. 2018b), EPIC-KITCHENS-100 (Damen et al. 2022) Activity Net-1.3 (Heilbron et al. 2015), and HACS (Zhao et al. 2019) to validate the effectiveness of our method. |
| Dataset Splits | No | Following most approaches (Zhang, Wu, and Li 2022; Shi et al. 2023), we leverage the pre-trained two-stream I3D (Carreira and Zisserman 2017) on Kinetics (Kay et al. 2017) to extract features from the THUMOS14 dataset. For the experiment on Activity Net-1.3, we use TSP R(2+1)D (Alwassel, Giancola, and Ghanem 2021) as the pre-trained model to extract video features, which is consistent with the methodology used in several recent studies (Shi et al. 2023). The paper refers to other works for methodology and standard splits but does not provide specific split percentages or sample counts for training, validation, and testing sets within the paper itself. |
| Hardware Specification | Yes | We conduct our experiments using Python 3.8, Py Torch 2.0, and CUDA 11.8 on a NVIDIA RTX 4090 GPU. |
| Software Dependencies | Yes | We conduct our experiments using Python 3.8, Py Torch 2.0, and CUDA 11.8 on a NVIDIA RTX 4090 GPU. |
| Experiment Setup | Yes | We employ the Adam W with warm-up and a cosine annealing learning rate schedule for model optimization. The number of feature pyramid layers is set to L=6 for THUMOS14, Multi THUMOS and EPIC-KITCHENS-100, and L=7 for Activity Net-1.3 and HACS. Following other studies (Shi et al. 2023), we utilize the center sampling to identify the positive samples. |