PlanLLM: Video Procedure Planning with Refinable Large Language Models

Authors: Dejie Yang, Zijing Zhao, Yang Liu

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our Plan LLM achieves superior performance on three benchmarks, demonstrating the effectiveness of our designs. Code https://github.com/idejie/Plan LLM
Researcher Affiliation Academia 1 Wangxuan Institute of Computer Technology, Peking University 2 State Key Laboratory of General Artificial Intelligence, Peking University EMAIL, EMAIL
Pseudocode No No explicit pseudocode or algorithm blocks were found in the main body of the paper.
Open Source Code Yes Code https://github.com/idejie/Plan LLM
Open Datasets Yes We employ three commonly used video datasets: Cross Task (Zhukov et al. 2019), NIV (Alayrac et al. 2016), and COIN (Tang et al. 2019).
Dataset Splits No The paper mentions using three commonly used video datasets: Cross Task, NIV, and COIN, but does not provide specific training/testing/validation split percentages, sample counts, or explicit references to how these datasets were partitioned for the experiments in the main text.
Hardware Specification Yes training the model with a batch size of 32 on NVIDIA A800 GPUs.
Software Dependencies No The paper mentions using S3D network, CLIP, BLIP2, Vicuna-7B, and LoRA, but does not provide specific version numbers for any of these software components or libraries.
Experiment Setup Yes During the frozen LLM training stage, we set the learning rate to 1 10 4 for the Q-Former and 1 10 3 for other modules, training the model with a batch size of 32 on NVIDIA A800 GPUs.