ARNet: Self-Supervised FG-SBIR with Unified Sample Feature Alignment and Multi-Scale Token Recycling
Authors: Jianan Jiang, Hao Tang, Zhilin Jiang, Weiren Yu, Di Wu
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments and Results Experimental Settings Clothes-V1. ... Our model is implemented using the Py Torch framework and is based on the Vi T-B/16-1K model (Dosovitskiy et al. 2020). ... As presented in Tables 1 and 2, our proposed method exhibits superior performance compared to other baselines on three datasets. ... Ablation Study Framework. Table 3 represents that our dual weightsharing networks can significantly enhance the overall performance of the model across various mainstream backbone networks. |
| Researcher Affiliation | Collaboration | Jianan Jiang1,4, Hao Tang2 , Zhilin Jiang1, Weiren Yu3, Di Wu1,4 1Hunan University 2Peking University 3University of Warwick 4Exponenti AI Innovation EMAIL, EMAIL, EMAIL |
| Pseudocode | No | The paper describes the methodology in narrative text and uses diagrams (Figure 4) to illustrate the framework, but does not present any formal pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code https://github.com/Exponenti AI/ARNet |
| Open Datasets | Yes | Additionally, we introduce a new dataset named Clothes V1, filling the gap in professional fashion clothing datasets in this field. Its multi-level quality can be valuable for other computer vision tasks as well. ... Please refer to our Git Hub for more details. ... We utilize two widely used FG-SBIR datasets QMUL-Chair-V2 and QMUL-Shoe-V2 (Yu et al. 2016) along with our proposed self-collected dataset Clothes-V1 to evaluate the performance of our proposed framework. |
| Dataset Splits | Yes | Clothes-V1 has 1200 (500) sketches (images), containing 925 (380) and 275 (120) for the training and validation set. ... The QMUL-Chair-V2 includes 964 (300) and 311 (100) sketches (images) for the training set and validation set, and the QMUL-Shoe-V2 includes 5982 (1800) and 666 (200) sketches (images) for the training set and validation set. |
| Hardware Specification | Yes | We train the model on a single NVIDIA 32GB Tesla V100 GPU |
| Software Dependencies | No | Our model is implemented using the Py Torch framework and is based on the Vi T-B/16-1K model (Dosovitskiy et al. 2020). |
| Experiment Setup | Yes | The input size is 224 224, the final embedding vector s dimension is 512, and the temperature parameter τ is 0.07. We train the model on a single NVIDIA 32GB Tesla V100 GPU, using a batch size of 16 and the Adam optimizer (Kingma and Ba 2014) with a learning rate of 6e-6 and weight decay of 1e-4, and the training process lasts for 500 epochs. |