VIoTGPT: Learning to Schedule Vision Tools Towards Intelligent Video Internet of Things

Authors: Yaoyao Zhong, Mengshi Qi, Rui Wang, Yuhan Qiu, Yang Zhang, Huadong Ma

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Quantitative and qualitative experiments and analyses demonstrate the effectiveness of VIo TGPT. We believe VIo TGPT contributes to improving human-centered experiences in VIo T applications.
Researcher Affiliation Academia 1State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, China EMAIL
Pseudocode No The paper describes methods using natural language and formulas, and Figure 2 illustrates a process flow. However, it does not contain a clearly labeled 'Pseudocode' or 'Algorithm' block with structured steps in a code-like format.
Open Source Code Yes Projects https://github.com/zhongyy/VIo TGPT
Open Datasets Yes To support VIo TGPT and related future works, we meticulously crafted the VIo TTool dataset, including the training dataset and the benchmark involving 11 representative vision models across three categories based on semi-automatic annotations. ... The dataset, named VIo T-Tool, will be publicly available to promote further research.
Dataset Splits Yes we collect the training dataset with training instructions (2.79 billion tokens) constructed by 200K pairs (pi and Ai,t) related to the 11 tools across three categories and the corresponding testing datasets with 1,841 pairs. ... In the training process, the instruction datasets are randomly divided into training and evaluating sets in a 49:1 proportion.
Hardware Specification Yes All the experiments are conducted on NVIDIA RTX 4090 GPUs.
Software Dependencies No The paper mentions using a parameter-efficient tuning method (LoRA) and following settings of Fast Chat, but does not provide specific version numbers for software dependencies such as programming languages, deep learning frameworks (e.g., PyTorch, TensorFlow), or other libraries.
Experiment Setup Yes To enable training, a parameter-efficient tuning method, i.e., Low-Rank Adaptation (Lo RA) (Hu et al. 2022), is used. Specifically, we attach the Lo RA modules to the query and key of self-attention layers, with the rank parameter 8, the scaling alpha parameter 16, and the dropout rate 0.05, following the settings of Fast Chat (Zheng et al. 2023). The maximum length of new tokens is 2,048. We finetune LLMs using an effective batch size of 256 and a learning rate of 5e-5 for 6 epochs with the Adam W optimizer.