Asymmetric Factorized Bilinear Operation for Vision Transformer

Authors: Junjie Wu, Qilong Wang, Jiangtao Xie, Pengfei Zhu, Qinghua Hu

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments are conducted with twenty Vi Ts on various tasks, and the results show our AFBO is superior to its counterparts while improving existing Vi Ts in terms of generalization and robustness. ... To evaluate our AFBO, experiments are conducted on various vision tasks (i.e., image classification on Image Net-1K ... object detection and instance segmentation on MS COCO) with twenty Vi T models... Finally, we conduct ablation studies on Image Net-1K.
Researcher Affiliation Academia Junjie Wu1 Qilong Wang1, Jiangtao Xie2 Pengfei Zhu1 Qinghua Hu1 1Tianjin University 2Dalian University of Technology EMAIL, EMAIL
Pseudocode No The paper describes the proposed method, AFBO, using textual descriptions and mathematical formulations (Eq. 1-7) along with diagrams (Fig. 1 and Fig. 2), but does not include any structured pseudocode or algorithm blocks.
Open Source Code Yes All programs are implemented by Py Torch (Paszke et al., 2019) and run on a server with 8 A6000 GPUs. The source code is available at https://github.com/Xavier Heart/AFBO.
Open Datasets Yes To evaluate our AFBO, experiments are conducted on various vision tasks (i.e., image classification on Image Net-1K (Krizhevsky et al., 2017) and its out-of-distribution variants (Hendrycks et al., 2021b; Hendrycks & Dietterich, 2019; Hendrycks et al., 2021a; Recht et al., 2019), object detection and instance segmentation on MS COCO (Lin et al., 2014)) and "All models are evaluated on four GLUE (Wang et al., 2019) benchmark tasks" and "fine-tune the pre-trained backbones of Swin Transformer (Liu et al., 2021) and Vision LLa MA (Chu et al., 2024) with our AFBO on i Nat2019 (Horn et al., 2018)."
Dataset Splits Yes To train models on Image Net-1K, we adopt exactly same strategies as the original works with 224 224 inputs. ... All detectors are implemented using MMDetection toolkit (Chen et al., 2019) with the default settings. ... All models are evaluated on four GLUE (Wang et al., 2019) benchmark tasks... fine-tune the pre-trained backbones of Swin Transformer (Liu et al., 2021) and Vision LLa MA (Chu et al., 2024) with our AFBO on i Nat2019 (Horn et al., 2018).
Hardware Specification Yes All programs are implemented by Py Torch (Paszke et al., 2019) and run on a server with 8 A6000 GPUs." and "In this subsection, we conduct experiments by using Edge Vi T (Chen et al., 2022) (i.e., a well-known lightweight model designed for mobile devices) and comparing it on an Intel(R) Xeon(R) Gold 5218 CPU @ 2.30GHz.
Software Dependencies No All programs are implemented by Py Torch (Paszke et al., 2019) and run on a server with 8 A6000 GPUs. The source code is available at https://github.com/Xavier Heart/AFBO. All detectors are implemented using MMDetection toolkit (Chen et al., 2019) with the default settings. (No specific version numbers are provided for PyTorch or MMDetection in the text.)
Experiment Setup Yes For evaluation on object detection and instance segmentation, we adopt Mask R-CNN (He et al., 2017) and Retina Net (Lin et al., 2020) as baseline detectors... All detectors are implemented using MMDetection toolkit (Chen et al., 2019) with the default settings. Specifically, the shorter side of input images is resized to 800, and all the models are optimized using SGD with weight decay of 1e-4, momentum of 0.9 and mini-batch size of 16. The learning rate is initialized to 0.01 and is decreased by a factor of 10 after 8 and 11 epochs, respectively.