Video Repurposing from User Generated Content: A Large-scale Dataset and Benchmark

Authors: Yongliang Wu, Wenbo Zhu, Jiawang Cao, Yi Lu, Bozheng Li, Weiheng Chi, Zihan Qiu, Lirian Su, Haolin Zheng, Jay Wu, Xu Yang

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through comprehensive experimental analysis and comparison with other state-of-the-art video highlight detection models, we demonstrate the superior performance and practical applicability of our proposed model for this task. The paper includes sections like "Experiments Setting," "Results and Analysis," and various ablation studies.
Researcher Affiliation Collaboration The authors are affiliated with "1Southeast University," "2Opus AI Research," "3University of Toronto," "4Brown University," and "5National University of Singapore." This mix of universities (academic) and a private company (Opus AI Research) indicates a collaborative affiliation type.
Pseudocode No No explicit pseudocode or algorithm blocks were found in the paper. The methodology is described in prose and mathematical formulations.
Open Source Code Yes Code https://github.com/yongliang-wu/Repurpose
Open Datasets Yes We introduce Repurpose-10K, a large-scale dataset specifically curated for the video repurposing task. ... Code https://github.com/yongliang-wu/Repurpose. Additionally, the PANN model is trained on Audio Set (Gemmeke et al. 2017).
Dataset Splits Yes We partition the dataset into train/val/test splits at a ratio of 8/1/1.
Hardware Specification Yes All experiments are conducted on two A100 GPUs within the Py Torch framework.
Software Dependencies No The paper mentions "Py Torch framework" but does not specify a version number for it or other key software dependencies (e.g., Python, CUDA) required for replication. It mentions models like Whisper X, CLIP ViT-B/32, PANN, and all-Mini LM-L6-v2 but without their specific library versions.
Experiment Setup Yes The embedding dimension of the model is set to d = 512, and the number of layers Ns, Nc, and Nf is set to 3. We utilize the Adam optimizer with a learning rate of 1e-4 for 100 epochs, which is adjusted using cosine learning rate decay, while the first 5 epochs employ linear warm-up to facilitate stable learning. The hyper-parameters λ1 4 are set to 0.1, 0.3, 0.1, and 0.7, respectively.