VIoTGPT: Learning to Schedule Vision Tools Towards Intelligent Video Internet of Things
Authors: Yaoyao Zhong, Mengshi Qi, Rui Wang, Yuhan Qiu, Yang Zhang, Huadong Ma
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Quantitative and qualitative experiments and analyses demonstrate the effectiveness of VIo TGPT. We believe VIo TGPT contributes to improving human-centered experiences in VIo T applications. |
| Researcher Affiliation | Academia | 1State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, China EMAIL |
| Pseudocode | No | The paper describes methods using natural language and formulas, and Figure 2 illustrates a process flow. However, it does not contain a clearly labeled 'Pseudocode' or 'Algorithm' block with structured steps in a code-like format. |
| Open Source Code | Yes | Projects https://github.com/zhongyy/VIo TGPT |
| Open Datasets | Yes | To support VIo TGPT and related future works, we meticulously crafted the VIo TTool dataset, including the training dataset and the benchmark involving 11 representative vision models across three categories based on semi-automatic annotations. ... The dataset, named VIo T-Tool, will be publicly available to promote further research. |
| Dataset Splits | Yes | we collect the training dataset with training instructions (2.79 billion tokens) constructed by 200K pairs (pi and Ai,t) related to the 11 tools across three categories and the corresponding testing datasets with 1,841 pairs. ... In the training process, the instruction datasets are randomly divided into training and evaluating sets in a 49:1 proportion. |
| Hardware Specification | Yes | All the experiments are conducted on NVIDIA RTX 4090 GPUs. |
| Software Dependencies | No | The paper mentions using a parameter-efficient tuning method (LoRA) and following settings of Fast Chat, but does not provide specific version numbers for software dependencies such as programming languages, deep learning frameworks (e.g., PyTorch, TensorFlow), or other libraries. |
| Experiment Setup | Yes | To enable training, a parameter-efficient tuning method, i.e., Low-Rank Adaptation (Lo RA) (Hu et al. 2022), is used. Specifically, we attach the Lo RA modules to the query and key of self-attention layers, with the rank parameter 8, the scaling alpha parameter 16, and the dropout rate 0.05, following the settings of Fast Chat (Zheng et al. 2023). The maximum length of new tokens is 2,048. We finetune LLMs using an effective batch size of 256 and a learning rate of 5e-5 for 6 epochs with the Adam W optimizer. |