reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

VidChain: Chain-of-Tasks with Metric-based Direct Preference Optimization for Dense Video Captioning

Authors: Ji Soo Lee, Jongha Kim, Jeehye Na, Jinyoung Park, Hyunwoo J. Kim

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our Vid Chain on two benchmarks-Activitynet Captions and You Cook2 for the challenging DVC task, and Activitynet Captions for temporal video grounding (TVG). In sum, our contributions are three-fold: ... We evaluate our Vid Chain on two benchmarks-Activitynet Captions and You Cook2 for the challenging DVC task, and Activitynet Captions for temporal video grounding (TVG).
Researcher Affiliation	Academia	Department of Computer Science and Engineering, Korea University EMAIL
Pseudocode	No	The paper describes methods and processes textually and with mathematical formulations, but it does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain any explicit statements about releasing source code for the described methodology, nor does it provide any links to code repositories.
Open Datasets	Yes	We experiment on two different dense video captioning benchmarks, Activity Net Captions (Krishna et al. 2017) and You Cook2 (Zhou, Xu, and Corso 2018).
Dataset Splits	Yes	We construct 10K and 1K samples for Acitivty Net and You Cook respectively for each path using the pre-defined templates, where the templates are provided in the supplement. Note we refer to each of the two types of dataset as Dt c and Dc t, respectively. Then, we combine our obtained dataset with the DVC QA pairs and dialogues following VTime LLM (Huang et al. 2024). Note that we adopt the full benchmark dataset unlike VTime LLM, which only uses a selected subset for training. This results in DCT of size of 50K for Activity Net and 6K for You Cook2. Overall, we use DCT to finetune Video LLMs, enhancing their performance on fine-grained video understanding tasks, including DVC and its sub-tasks. Further details are in the supplement. ... Visualization is done on Activity Net validation set with VTime LLM in Pc t path.
Hardware Specification	No	The paper does not specify any particular hardware (e.g., GPU models, CPU types, memory) used for conducting the experiments.
Software Dependencies	No	The paper does not provide specific version numbers for any software dependencies, such as programming languages, libraries, or frameworks used in the implementation.
Experiment Setup	No	The paper mentions hyperparameters like 'β' and 'γ' but does not provide their specific values or other critical experimental setup details such as learning rate, batch size, optimizer configuration, or number of epochs.