reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

SPFormer: Enhancing Vision Transformer with Superpixel Representation

Authors: Jieru Mei, Liang-Chieh Chen, Alan Yuille, Cihang Xie

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our evaluation of SPFormer on the Image Net dataset demonstrates its superior efficiency and performance over the Dei T baseline under varying configurations as shown in Tab. 1. Specifically, SPFormer-S, which employs the standard Vi T configuration with 196 tokens, exceeds the performance of Dei T-S by 1.1%, achieving a top-1 accuracy of 81.0% compared to 79.9% for Dei T-S. Furthermore, SPFormer-T outperforms Dei T-T by 1.4%, recording 73.6% versus 72.2%. Table 2: Ablation study on the design choices in SPFormer. Table 3: Semantic segmentation on ADE20K val split. Table 4: Semantic segmentation on Pascal Conext val split. Table 5: Evaluation of superpixel quality in a zero-shot setting on Pascal VOC 2012 and Pascal-Parts-58 datasets, using 196 patches/superpixels. Table 6: Quantitative evaluation of SPFormer s robustness to rotation, comparing performance at different angles.
Researcher Affiliation	Collaboration	Jieru Mei EMAIL Department of Computer Science Johns Hopkins University Liang-Chieh Chen EMAIL Bytedance Alan Yuille EMAIL Department of Computer Science Johns Hopkins University Cihang Xie EMAIL Department of Computer Science and Engineering University of California, Santa Cruz
Pseudocode	No	The paper describes the mechanisms of SCA and iterative feature refinement using equations (Eq. 1, 3, 4) and textual descriptions, but no clearly labeled "Pseudocode" or "Algorithm" block is present.
Open Source Code	No	The paper does not provide an explicit statement or a direct link to the source code for the methodology described in this paper. Footnote 1 refers to official code for a different paper (SViT), not SPFormer.
Open Datasets	Yes	Our evaluation of SPFormer on the Image Net dataset demonstrates its superior efficiency and performance over the Dei T baseline under varying configurations as shown in Tab. 1. All models train on the Image Net dataset (Russakovsky et al., 2015) for 300 epochs. Furthermore, we assess the generalizability of our superpixel representation using the COCO dataset (Lin et al., 2014). We evaluate SPFormer on the ADE20K (Zhou et al., 2017) and Pascal Context (Mottaghi et al., 2014) datasets. This test involved a quantitative analysis on both object and part levels using the Pascal VOC 2012 dataset (Everingham et al., 2015) and Pascal-Part-58 (Zhao et al., 2019).
Dataset Splits	Yes	All models train on the Image Net dataset (Russakovsky et al., 2015) for 300 epochs. We evaluate SPFormer on the ADE20K (Zhou et al., 2017) and Pascal Context (Mottaghi et al., 2014) datasets. As shown in Tab. 3 and Tab. 4, the performance gains in m Io U are noteworthy when using Image Net-pretrained models: 4.2% improvement on ADE20K and 2.8% on Pascal Context. Table 3: Semantic segmentation on ADE20K val split. Table 4: Semantic segmentation on Pascal Conext val split.
Hardware Specification	No	The paper mentions "We thank the Center for AI Safety for supporting our computing needs." but does not specify any details about the hardware used for running the experiments (e.g., GPU models, CPU types, or memory).
Software Dependencies	No	The paper mentions using the "AdamW optimizer" and "Layer Scale technique" but does not provide specific version numbers for any software libraries, frameworks, or programming languages used in the experiments.
Experiment Setup	Yes	Adhering to the protocols established in Dei T (Touvron et al., 2021a), we implement robust data augmentations, use the Adam W optimizer, and follow a cosine decay learning rate schedule. All models train on the Image Net dataset (Russakovsky et al., 2015) for 300 epochs. During SPFormer-B/16 training, significant overfitting challenges arose. Increasing the Stochastic Depth (Huang et al., 2016) rate from 0.1 to 0.6 effectively addressed these issues.