$F^3Set$: Towards Analyzing Fast, Frequent, and Fine-grained Events from Videos

Authors: Zhaoyu Liu, Kan Jiang, Murong Ma, Zhe Hou, Yun Lin, Jin Song Dong

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluated popular temporal action understanding methods on F3Set, revealing substantial challenges for existing techniques. Additionally, we propose a new method, F3ED, for F3 event detections, achieving superior performance. The dataset, model, and benchmark code are available at https: //github.com/F3Set/F3Set. Leveraging F3Set, we extensively evaluate existing temporal action understanding methods, aiming to reveal the challenges of F3 event understanding. To provide guidelines for future research, we conduct a number of ablation studies on modeling choices. In this section, we benchmark existing temporal action understanding methods, including TAL, TAS, and TASpot, on the F3Set dataset and conduct a series of ablation studies.
Researcher Affiliation Academia Zhaoyu Liu1,2, Kan Jiang2, Murong Ma2, Zhe Hou3, Yun Lin4, Jin Song Dong2 1Ningbo University 2 National University of Singapore 3 Griffith University 4 Shanghai Jiao Tong University EMAIL, EMAIL EMAIL, lin EMAIL, EMAIL
Pseudocode No The paper describes the F3ED model architecture in Section 4 with components like Video Encoder, Event Localizer, Multi-label Event Classifier, and Contextual module using mathematical formulations and descriptive text, but it does not contain a clearly labeled pseudocode or algorithm block.
Open Source Code Yes The dataset, model, and benchmark code are available at https: //github.com/F3Set/F3Set.
Open Datasets Yes To advance research in video understanding, we introduce F3Set, a benchmark that consists of video datasets for precise F3 event detection. The dataset, model, and benchmark code are available at https: //github.com/F3Set/F3Set.
Dataset Splits Yes We employ a training, validation, and testing split of 3:1:1, with the training and validation sets drawn from the same video sources, while the test set features clips from distinct videos.
Hardware Specification No Our proposed F3ED model... can be trained quickly on a single GPU. However, it does not specify the model or type of GPU used.
Software Dependencies No The paper mentions 'We implement and train models on F3Set in an end-to-end manner.' and 'For more implementation details, please refer to Appendix F.' but does not provide specific software names with version numbers in the main text.
Experiment Setup No The paper states 'The default model takes stride size 2 and clip length 96.' and mentions referring to Appendix F for more implementation details. However, it does not provide concrete hyperparameter values like learning rate, batch size, number of epochs, or optimizer settings in the main text.