$F^3Set$: Towards Analyzing Fast, Frequent, and Fine-grained Events from Videos
Authors: Zhaoyu Liu, Kan Jiang, Murong Ma, Zhe Hou, Yun Lin, Jin Song Dong
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluated popular temporal action understanding methods on F3Set, revealing substantial challenges for existing techniques. Additionally, we propose a new method, F3ED, for F3 event detections, achieving superior performance. The dataset, model, and benchmark code are available at https: //github.com/F3Set/F3Set. Leveraging F3Set, we extensively evaluate existing temporal action understanding methods, aiming to reveal the challenges of F3 event understanding. To provide guidelines for future research, we conduct a number of ablation studies on modeling choices. In this section, we benchmark existing temporal action understanding methods, including TAL, TAS, and TASpot, on the F3Set dataset and conduct a series of ablation studies. |
| Researcher Affiliation | Academia | Zhaoyu Liu1,2, Kan Jiang2, Murong Ma2, Zhe Hou3, Yun Lin4, Jin Song Dong2 1Ningbo University 2 National University of Singapore 3 Griffith University 4 Shanghai Jiao Tong University EMAIL, EMAIL EMAIL, lin EMAIL, EMAIL |
| Pseudocode | No | The paper describes the F3ED model architecture in Section 4 with components like Video Encoder, Event Localizer, Multi-label Event Classifier, and Contextual module using mathematical formulations and descriptive text, but it does not contain a clearly labeled pseudocode or algorithm block. |
| Open Source Code | Yes | The dataset, model, and benchmark code are available at https: //github.com/F3Set/F3Set. |
| Open Datasets | Yes | To advance research in video understanding, we introduce F3Set, a benchmark that consists of video datasets for precise F3 event detection. The dataset, model, and benchmark code are available at https: //github.com/F3Set/F3Set. |
| Dataset Splits | Yes | We employ a training, validation, and testing split of 3:1:1, with the training and validation sets drawn from the same video sources, while the test set features clips from distinct videos. |
| Hardware Specification | No | Our proposed F3ED model... can be trained quickly on a single GPU. However, it does not specify the model or type of GPU used. |
| Software Dependencies | No | The paper mentions 'We implement and train models on F3Set in an end-to-end manner.' and 'For more implementation details, please refer to Appendix F.' but does not provide specific software names with version numbers in the main text. |
| Experiment Setup | No | The paper states 'The default model takes stride size 2 and clip length 96.' and mentions referring to Appendix F for more implementation details. However, it does not provide concrete hyperparameter values like learning rate, batch size, number of epochs, or optimizer settings in the main text. |