PlanLLM: Video Procedure Planning with Refinable Large Language Models
Authors: Dejie Yang, Zijing Zhao, Yang Liu
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our Plan LLM achieves superior performance on three benchmarks, demonstrating the effectiveness of our designs. Code https://github.com/idejie/Plan LLM |
| Researcher Affiliation | Academia | 1 Wangxuan Institute of Computer Technology, Peking University 2 State Key Laboratory of General Artificial Intelligence, Peking University EMAIL, EMAIL |
| Pseudocode | No | No explicit pseudocode or algorithm blocks were found in the main body of the paper. |
| Open Source Code | Yes | Code https://github.com/idejie/Plan LLM |
| Open Datasets | Yes | We employ three commonly used video datasets: Cross Task (Zhukov et al. 2019), NIV (Alayrac et al. 2016), and COIN (Tang et al. 2019). |
| Dataset Splits | No | The paper mentions using three commonly used video datasets: Cross Task, NIV, and COIN, but does not provide specific training/testing/validation split percentages, sample counts, or explicit references to how these datasets were partitioned for the experiments in the main text. |
| Hardware Specification | Yes | training the model with a batch size of 32 on NVIDIA A800 GPUs. |
| Software Dependencies | No | The paper mentions using S3D network, CLIP, BLIP2, Vicuna-7B, and LoRA, but does not provide specific version numbers for any of these software components or libraries. |
| Experiment Setup | Yes | During the frozen LLM training stage, we set the learning rate to 1 10 4 for the Q-Former and 1 10 3 for other modules, training the model with a batch size of 32 on NVIDIA A800 GPUs. |