ViG: Linear-complexity Visual Sequence Learning with Gated Linear Attention
Authors: Bencheng Liao, Xinggang Wang, Lianghui Zhu, Qian Zhang, Chang Huang
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive experiments to validate the effectiveness of our proposed models. We present the main results on Image Net (Deng et al. 2009). Additionally, we benchmark our model on downstream dense prediction tasks, including object detection on the COCO (Lin et al. 2014) dataset and semantic segmentation on ADE20K (Zhou et al. 2019). |
| Researcher Affiliation | Collaboration | Bencheng Liao1, 2, Xinggang Wang2, *, Lianghui Zhu2, Qian Zhang3, Chang Huang3 1Institute of Artificial Intelligence, Huazhong University of Science & Technology 2School of EIC, Huazhong University of Science & Technology 3Horizon Robotics EMAIL, EMAIL |
| Pseudocode | No | The paper describes the Gated Linear Attention (GLA) and Bidirectional Gated Linear Attention (Bi GLA) mechanisms using mathematical formulas and textual descriptions, but it does not include a structured pseudocode block or algorithm. |
| Open Source Code | Yes | Code https://github.com/hustvl/Vi G |
| Open Datasets | Yes | We present the main results on Image Net (Deng et al. 2009). Additionally, we benchmark our model on downstream dense prediction tasks, including object detection on the COCO (Lin et al. 2014) dataset and semantic segmentation on ADE20K (Zhou et al. 2019). |
| Dataset Splits | No | The paper states: "We train classification experiments on Image Net-1K dataset... We mainly follow the training and evaluation setting of Dei T and Swin Transformer (Touvron et al. 2021; Liu et al. 2021b). All the models are trained from scratch for 300 epochs. Further details are provided in extended version." While it refers to existing benchmarks and training settings, it does not explicitly provide the specific dataset split percentages or counts within the main text. |
| Hardware Specification | Yes | Tp. (images/s) is measured on a single 4090 GPU with batch size 256 following (Liu et al. 2021b). ... Throughput and memory are test on 4090 GPU with batch size 256 and image size 224. |
| Software Dependencies | No | The paper does not explicitly mention any specific software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions). |
| Experiment Setup | Yes | All the models are trained from scratch for 300 epochs. ... Tp. (images/s) is measured on a single 4090 GPU with batch size 256... Training details are the same as VRWKV (Duan et al. 2024) and Vim (Zhu et al. 2024). |