Modality-Aware Shot Relating and Comparing for Video Scene Detection
Authors: Jiawei Tan, Hongxing Wang, Kang Dang, Jiaxin Li, Zhilong Ou
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experimental results on public benchmark datasets demonstrate that the proposed MASRC significantly advances video scene detection. |
| Researcher Affiliation | Academia | 1Key Laboratory of Dependable Service Computing in Cyber Physical Society (Chongqing University), Ministry of Education, China 2School of Big Data and Software Engineering, Chongqing University, China 3School of AI and Advanced Computing, XJTLU Entrepreneur College (Taicang), Xi an Jiaotong-Liverpool University, Suzhou, China EMAIL, EMAIL, jiaxin EMAIL, EMAIL |
| Pseudocode | No | The paper describes its methodology using mathematical equations and block diagrams (Figure 2), but it does not include explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code https://github.com/Ex Morgan-Alter/MASRC |
| Open Datasets | Yes | Datasets: We assess the performance of our method on three widely used video scene detection datasets, i.e., Movie Net (Huang et al. 2020), BBC (Baraldi, Grana, and Cucchiara 2015), and OVSD (Rotman, Porat, and Ashour 2016). |
| Dataset Splits | Yes | Movie Net. It is a vast dataset with 1,100 movies and 1.6 million shots. 318 movies are annotated with scene boundaries, forming the Movie Scenes dataset (Rao et al. 2020) for video scene detection. Movie Scenes is further divided into subsets of 190 movies for training, 64 for validation, and 64 for testing. |
| Hardware Specification | Yes | We train our MASRC on a NVIDIA RTX 3060 GPU. |
| Software Dependencies | No | For model training, we employ the Adam (Kingma and Ba 2015) optimizer with a mini-batch size of 512. For fully supervised learning and self-supervised learning, we initialize the learning rate at 10^-4. In the case of self-supervised transfer learning, we set the initial learning rate to 10^-3 for pre-training and reduce it to 10^-5 for fine-tuning. Across all training stages, we apply a linear warm-up strategy during the initial epoch, followed by a learning rate decay according to a cosine schedule (He et al. 2019). |
| Experiment Setup | Yes | Implementation Details: We take T = 14 neighboring shots as input to our model. In MASRC, we set the activation functions σ( ) in Eqs. (2), (6) and (8) as ReLU (He et al. 2015). In Eq. (1), we specify the the number of top similar shots as k = 4. For model training, we employ the Adam (Kingma and Ba 2015) optimizer with a mini-batch size of 512. For fully supervised learning and self-supervised learning, we initialize the learning rate at 10^-4. In the case of self-supervised transfer learning, we set the initial learning rate to 10^-3 for pre-training and reduce it to 10^-5 for fine-tuning. Across all training stages, we apply a linear warm-up strategy during the initial epoch, followed by a learning rate decay according to a cosine schedule (He et al. 2019). ... In all experiments, we report the average of metrics across five different random seeds. |