reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

PolaFormer: Polarity-aware Linear Attention for Vision Transformers

Authors: Weikang Meng, Yadan Luo, Xin Li, Dongmei Jiang, Zheng Zhang

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate that the proposed Pola Former improves performance on various vision tasks, enhancing both expressiveness and efficiency by up to 4.6%. In this section, we evaluate our Pola Former model on three tasks: image classification on Image Net1K (Deng et al., 2009), object detection and instance segmentation on COCO (Lin et al., 2014), and semantic segmentation on ADE20K (Zhou et al., 2019), comparing its performance with previous efficient vision models. Additionally, we assess Pola Former on the Long Range Arena (LRA) task (Tay et al., 2021) to compare against other linear attention models.
Researcher Affiliation	Academia	1 Harbin Institute of Technology, Shenzhen, China 2 Pengcheng Laboratory, China 3 UQMM Lab, University of Queensland, Australia
Pseudocode	No	The paper describes the methodology using mathematical formulations and descriptive text, but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks, nor does it present any structured, code-like procedures.
Open Source Code	Yes	Code is available at https://github.com/Zachary Meng/Pola Former.
Open Datasets	Yes	We evaluate our Pola Former model on three tasks: image classification on Image Net1K (Deng et al., 2009), object detection and instance segmentation on COCO (Lin et al., 2014), and semantic segmentation on ADE20K (Zhou et al., 2019), comparing its performance with previous efficient vision models. Additionally, we assess Pola Former on the Long Range Arena (LRA) task (Tay et al., 2021) to compare against other linear attention models.
Dataset Splits	Yes	The Image Net-1K (Deng et al., 2009) dataset is the widely used dataset for image classification tasks, containing 1,000 categories and over 1.2 million training images. We further validate the effectiveness of the proposed approach across various vision tasks, including object detection task on the COCO dataset (Lin et al., 2014), which contains over 118K training images and 5K validation images.
Hardware Specification	Yes	The models were pretrained on 8 NVIDIA A800 GPUs and fine-tuned on 8 NVIDIA RTX A6000 and 8 NVIDIA RTX 3090 GPUs.
Software Dependencies	No	The paper mentions several software components and projects like 'Adam W optimizer', 'Swin Transformer implementation made by Microsoft', 'mmcv-detection (Contributors, 2018) project', 'mmcv-segmentation (Contributors, 2018) project', and 'Skyformer (Chen et al., 2021)'. However, it does not provide specific version numbers for these software dependencies, which is required for reproducibility.
Experiment Setup	Yes	In this task, we use the Adam W optimizer (Loshchilov & Hutter, 2019) to train all of our models for 400 epochs, including 20 epochs for linear warm-up. The basic learning rate is set to 1e 3 for 1024 batch size. Additionally, we use a weight decay of 5e 2. For the PVT model, we select from Retina Net and Mask R-CNN as detectors, with the schedule set to 1 . For the Swin model, we choose the detector from Mask R-CNN and Cascade Mask R-CNN as detectors, where models using Mask R-CNN are experimented with under both 1 and 3 schedule settings, while models using Cascade Mask R-CNN case are trained under the 3 schedule. The training epoch is set to 12 per schedule and we use the Adam W optimizer with a learning rate of 1e 4 and a weight decay of 1e 4. For Listops and Text Classification, we set batch size to 32 with 1e 4 learning rate. For Pathfinder, we set batch size to 128 with 5e 4 learning rate. For Image Classification, we set batch size to 256 with 1e 4 learning rate. For Retrieval sub-task, we set batch size to 16 with 2e 4 learning rate. All models are trained from scratch using the Adam W optimizer.