reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Action-Agnostic Point-Level Supervision for Temporal Action Detection

Authors: Shuhei M. Yoshida, Takashi Shibata, Makoto Terao, Takayuki Okatani, Masashi Sugiyama

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on the variety of datasets (THUMOS 14, Fine Action, GTEA, BEOID, and Activity Net 1.3) demonstrate that the proposed approach is competitive with or outperforms prior methods for video-level and point-level supervision in terms of the trade-off between the annotation cost and detection performance. We also find that even training only with annotated frames can achieve competitive results with the previous studies. This suggests the inherent effectiveness of AAPL supervision.
Researcher Affiliation	Collaboration	Shuhei M. Yoshida1, Takashi Shibata1, Makoto Terao1, Takayuki Okatani2,3, Masashi Sugiyama3,4 1Visual Intelligence Research Laboratories, NEC Corporation, Kanagawa 211-8666, Japan 2Graduate School of Information Sciences, Tohoku University, Miyagi 980-8579, Japan 3RIKEN Center for Advanced Intelligence Project, Tokyo 103-0027, Japan 4Graduate School of Frontier Sciences, The University of Tokyo, Chiba 277-8561, Japan EMAIL, EMAIL, EMAIL
Pseudocode	No	The paper describes the action detection model and training objectives in Section 3, but it does not present these in structured pseudocode or algorithm blocks. The methods are explained in paragraph text.
Open Source Code	Yes	Code https://github.com/smy-nec/AAPL
Open Datasets	Yes	Extensive experiments on the variety of datasets (THUMOS 14, Fine Action, GTEA, BEOID, and Activity Net 1.3) demonstrate that the proposed approach is competitive with or outperforms prior methods for video-level and point-level supervision in terms of the trade-off between the annotation cost and detection performance. BEOID (Damen et al. 2014), GTEA (Fathi, Ren, and Rehg 2011), THUMOS 14 (Jiang et al. 2014), Fine Action (Liu et al. 2022), and Activity Net 1.3 (Heilbron et al. 2015).
Dataset Splits	Yes	We adopt the training-validation split from Ma et al. (2020). For THUMOS 14, following the convention (Wang et al. 2017; Nguyen et al. 2018), we use the validation set for training and the test set for evaluation.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments.
Software Dependencies	No	The paper mentions using a 'modified version of the VGG Image Annotator (VIA) (Dutta, Gupta, and Zissermann 2016; Dutta and Zissermann 2019)' for annotation time measurement. However, it does not specify any software libraries or frameworks with version numbers (e.g., PyTorch, TensorFlow, Python) used for implementing the core methodology, which would be necessary for replication.
Experiment Setup	No	We defer details of implementation and hyperparameters to the extended version (Yoshida et al. 2024).