IFORMER: INTEGRATING CONVNET AND TRANSFORMER FOR MOBILE APPLICATION
Authors: Chuanyang Zheng
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct comprehensive experiments demonstrating that i Former outperforms existing lightweight networks across various tasks. Notably, i Former achieves an impressive Top-1 accuracy of 80.4% on Image Net-1k with a latency of only 1.10 ms on an i Phone 13, surpassing the recently proposed Mobile Net V4 under similar latency constraints. Additionally, our method shows significant improvements in downstream tasks, including COCO object detection, instance segmentation, and ADE20k semantic segmentation, while still maintaining low latency on mobile devices for high-resolution inputs in these scenarios. |
| Researcher Affiliation | Academia | Chuanyang Zheng Independent Researcher chuanyang EMAIL |
| Pseudocode | No | The paper includes equations (1), (2), and (3) to formally describe the modulation mechanism and SHMA, and diagrams in Figure 4 illustrating the architecture. However, it does not contain any sections explicitly labeled "Pseudocode" or "Algorithm", nor structured steps formatted like code or an algorithm. |
| Open Source Code | Yes | Code and models are available at: https://github.com/Chuanyang Zheng/i Former. |
| Open Datasets | Yes | We first evaluate our models on classification on Image Net-1K (Deng et al., 2009). ... downstream tasks, including COCO object detection, instance segmentation, and ADE20k semantic segmentation. |
| Dataset Splits | Yes | We first evaluate our models on classification on Image Net-1K (Deng et al., 2009). To ensure a fair comparison with prior studies, we follow the previous training recipe (Touvron et al., 2021a; Liu et22) and train all models for 300 epochs with a standard image size of 224x224. ... we train Mask R-CNN (He et al., 2017) with i Former as the backbone for 12 epochs (1 ), using the MMDetection toolkit (Chen et al., 2019). ... We conduct experiments on the ADE20K (Zhou et al., 2017) using the Semantic FPN (Kirillov et al., 2019), based on the MMSegmentation toolkit (Contributors, 2020). |
| Hardware Specification | Yes | Notably, i Former achieves an impressive Top-1 accuracy of 80.4% on Image Net-1k with a latency of only 1.10 ms on an i Phone 13... The latency is measured on an i Phone 13. ... measured on an actual i Phone 13 and compiled by Core ML Tools (Core ML)... |
| Software Dependencies | No | The paper mentions 'Core ML Tools (Core ML)', 'MMDetection toolkit (Chen et al., 2019)', and 'MMSegmentation toolkit (Contributors, 2020)'. While toolkits are named and cited, no specific version numbers for any of these software dependencies are provided in the main text. |
| Experiment Setup | No | The paper mentions training models for "300 epochs with a standard image size of 224x224" for ImageNet-1K, and "12 epochs" for Mask R-CNN on COCO. It states that it "follow the previous training recipe (Touvron et al., 2021a; Liu et al., 2022)" for ImageNet, and that it adds "drop path and layer scale" for larger models, which are "commonly used". However, it defers detailed hyperparameters like learning rates, batch sizes, specific optimizers, or other system-level settings to these external references or common practices, without explicitly listing them in the main text. |