TeaserGen: Generating Teasers for Long Documentaries

Authors: Weihan Xu, Paul Pu Liang, Haven Kim, Julian McAuley, Taylor Berg-Kirkpatrick, Hao-Wen (Herman) Dong

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experimental results show that the pretraining-based approach is more effective at identifying relevant visual content than directly trained deep autoregressive models.
Researcher Affiliation Academia Weihan Xu1 Paul Pu Liang2 Haven Kim3 Julian Mc Auley3 Taylor Berg-Kirkpatrick3 Hao-Wen Dong4 1Duke University 2Massachusetts Institute of Technology 3University of California San Diego 4University of Michigan EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode No The paper describes the search algorithm in Appendix O, titled "DETAILS OF SEARCH ALGORITHM", in paragraph form without structured pseudocode blocks like 'Input:', 'Output:', 'for loops', or 'if-else' statements.
Open Source Code Yes The Documentary Net dataset, along with all source code and demos, can be found on our website. 1https://wx83.github.io/TeaserGen_Official/ ... For reproducibility, we will release all the source code.
Open Datasets Yes In this paper, we present a new documentary dataset with 1,269 high-quality documentaries paired with their teasers. The proposed Documentary Net dataset features various modalities such as video, speech, music, sound effects, narrations and tags. ... We propose Documentary Net, a publicly-available dataset consisting of 1,269 high-quality documentaries paired with their teasers. ... The Documentary Net dataset, along with all source code and demos, can be found on our website. 1https://wx83.github.io/TeaserGen_Official/
Dataset Splits Yes We evaluate our model on a test set of 49 documentaries. We include dataset split details in Appendix K. ... Table 15: Dataset Split Details. Split Number of Samples Total Hours. Train 1026 514 Validation 57 32 Test 49 29
Hardware Specification Yes All experiments are conducted on an NVIDIA RTX A6000 GPU.
Software Dependencies No The paper mentions using specific models like CLIP-Vi T-B/32 and CLIP-Vi T-L/14, and the Adam optimizer (Kingma & Ba, 2017), but does not provide specific version numbers for core software components such as Python, PyTorch, or CUDA libraries, which are essential for reproducibility.
Experiment Setup Yes For Teaser Gen-LR, we utilize 3 transformer layers with a hidden dimension of 768, and we use the L2 distance between the ground truth image embeddings and the generated ones as the loss function. The batch size is set to 16, and we optimize using the Adam optimizer (Kingma & Ba, 2017) with a learning rate of 1e-4. We train the proposed models for 15 epochs and select the best model on the validation set.