reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

DiffDVC: Accurate Event Detection for Dense Video Captioning via Diffusion Models

Authors: Wei Chen, Jianwei Niu, Xuefeng Liu, Zhendong Wang, Shaojie Tang, Guogang Zhu

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Comprehensive experiments on Activity Net-1.3, Activity Net Captions, and You Cook2 datasets show Diff DVC achieving superior performance. To explore Diff DVC in detail, we conduct ablation studies using the Activity Net Captions and You Cook2 datasets.
Researcher Affiliation	Academia	1State Key Laboratory of Virtual Reality Technology and Systems, School of Computer Science and Engineering, Beihang University, Beijing, China 2Zhongguancun Laboratory, Beijing, China 3Zhengzhou University Research Institute of Industrial Technology, Zhengzhou University, Zhengzhou, China 4Department of Management Science and Systems, University at Buffalo, Buffalo, New York, United States EMAIL EMAIL
Pseudocode	No	The paper describes methods using mathematical equations and text, but does not include a clearly labeled pseudocode or algorithm block.
Open Source Code	No	The paper does not provide an explicit statement about releasing source code or a link to a code repository for the methodology described.
Open Datasets	Yes	We perform experiments using the Activity Net-1.3 (Caba Heilbron et al. 2015), Activity Net Captions (Krishna et al. 2017), and You Cook2 (Zhou, Xu, and Corso 2018) datasets. By gradually scaling down ground-truth object boxes in the COCO validation dataset (Lin et al. 2014) and ground-truth event proposals in the THUMOS14 validation dataset (Idrees et al. 2017).
Dataset Splits	Yes	Activity Net-1.3 has 10,024 training, 4,926 validation, and 5,044 testing videos. Activity Net Captions includes 10,009 training, 4,917 validation, and 5044 testing videos. You Cook2 contains 1,333 training, 457 validation, and 210 testing videos. Due to the inaccessibility of the testing sets of these datasets, we evaluate Diff DVC using the validation sets following previous methods.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., GPU models, CPU types) used for running the experiments.
Software Dependencies	No	The paper mentions using the Adam optimizer but does not specify versions for any other software libraries, frameworks, or programming languages.
Experiment Setup	Yes	For Activity Net-1.3 and Activity Net Captions, we configure the number of event proposals or queries N to be 15, while for You Cook2, N is set to 100. During inference, the sample steps in DDIM are set to 1. We train word embeddings with 512 dimensions from scratch. The signal scaling factor is 1.0. We apply the Adam optimizer and the learning rate is initialized to 5e-5.