reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

ImagePiece: Content-aware Re-tokenization for Efficient Image Recognition

Authors: Seungdong Yoa, Seungjun Lee, Hye-Seung Cho, Bumsoo Kim, Woohyung Lim

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we conduct extensive experiments to demonstrate our retokenization strategy for efficient Vi Ts. We first show that our method outperforms the baselines by a substantial margin. Analytical experiments are conducted to further explain the factors contributing to our method s effectiveness. Notably, the hyper-speed inference experiment in Fig. 3 reveals that our method achieves relatively robust performance even with a drastically reduced number of tokens.
Researcher Affiliation	Collaboration	1LG AI Research 2Chung-ang University EMAIL, EMAIL
Pseudocode	No	The paper describes its methodology in natural language text and provides an architectural diagram, but it does not include explicitly structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain an explicit statement about releasing source code or provide a link to a code repository.
Open Datasets	Yes	We conduct all the experiments on Image Net-1k (Deng et al. 2009) consisting of 1.2 million images in the training set and 50k images in the test set.
Dataset Splits	Yes	We conduct all the experiments on Image Net-1k (Deng et al. 2009) consisting of 1.2 million images in the training set and 50k images in the test set.
Hardware Specification	Yes	The throughput (img/s) is measured on a single NVIDIA Ge Force RTX 3090 during inference.
Software Dependencies	No	The paper does not specify any software dependencies with version numbers, such as programming languages or libraries.
Experiment Setup	Yes	Implementation details. We conduct all the experiments on Image Net-1k (Deng et al. 2009) consisting of 1.2 million images in the training set and 50k images in the test set. The image resolution is 224 224 in training and testing. During training our models, we simply follow all the training strategies and optimization methods used in Dei T (Touvron et al. 2021a). We train our model from scratch for 300 epochs, and we don t use any tricks (e.g., adding extra parameters, starting from an existing checkpoint or fine-tuning, using additional training tricks), unlike other prior works. The throughput (img/s) is measured on a single NVIDIA Ge Force RTX 3090 during inference. For our method to apply the local coherence bias module, we adopt simple convolutional layers (four 3 3 convolutions and a single 1 1 convolution), replacing a standard Vi T s patchify stem. In all experiments, we set the proportion p of the non-semantic token set to 0.3, targeting the bottom 30% of tokens based on their importance (attentiveness). Therefore, these tokens are candidates for retokenization. We set the similarity merging ratio to 0.08, meaning that token pairs equivalent to 0.08 the total number of tokens from the non-semantic token set are selected for merging based on their similarity. Also, we set the pruning ratio r to 0.8, which results in discarding the bottom 20% of tokens, identified as non-semantic, after retokenization by Image Piece. These hyperparameters are specifically adjusted to further accelerate Vi Ts beyond the standard settings.