ARNet: Self-Supervised FG-SBIR with Unified Sample Feature Alignment and Multi-Scale Token Recycling

Authors: Jianan Jiang, Hao Tang, Zhilin Jiang, Weiren Yu, Di Wu

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments and Results Experimental Settings Clothes-V1. ... Our model is implemented using the Py Torch framework and is based on the Vi T-B/16-1K model (Dosovitskiy et al. 2020). ... As presented in Tables 1 and 2, our proposed method exhibits superior performance compared to other baselines on three datasets. ... Ablation Study Framework. Table 3 represents that our dual weightsharing networks can significantly enhance the overall performance of the model across various mainstream backbone networks.
Researcher Affiliation Collaboration Jianan Jiang1,4, Hao Tang2 , Zhilin Jiang1, Weiren Yu3, Di Wu1,4 1Hunan University 2Peking University 3University of Warwick 4Exponenti AI Innovation EMAIL, EMAIL, EMAIL
Pseudocode No The paper describes the methodology in narrative text and uses diagrams (Figure 4) to illustrate the framework, but does not present any formal pseudocode or algorithm blocks.
Open Source Code Yes Code https://github.com/Exponenti AI/ARNet
Open Datasets Yes Additionally, we introduce a new dataset named Clothes V1, filling the gap in professional fashion clothing datasets in this field. Its multi-level quality can be valuable for other computer vision tasks as well. ... Please refer to our Git Hub for more details. ... We utilize two widely used FG-SBIR datasets QMUL-Chair-V2 and QMUL-Shoe-V2 (Yu et al. 2016) along with our proposed self-collected dataset Clothes-V1 to evaluate the performance of our proposed framework.
Dataset Splits Yes Clothes-V1 has 1200 (500) sketches (images), containing 925 (380) and 275 (120) for the training and validation set. ... The QMUL-Chair-V2 includes 964 (300) and 311 (100) sketches (images) for the training set and validation set, and the QMUL-Shoe-V2 includes 5982 (1800) and 666 (200) sketches (images) for the training set and validation set.
Hardware Specification Yes We train the model on a single NVIDIA 32GB Tesla V100 GPU
Software Dependencies No Our model is implemented using the Py Torch framework and is based on the Vi T-B/16-1K model (Dosovitskiy et al. 2020).
Experiment Setup Yes The input size is 224 224, the final embedding vector s dimension is 512, and the temperature parameter τ is 0.07. We train the model on a single NVIDIA 32GB Tesla V100 GPU, using a batch size of 16 and the Adam optimizer (Kingma and Ba 2014) with a learning rate of 6e-6 and weight decay of 1e-4, and the training process lasts for 500 epochs.