Modeling Object Dissimilarity for Deep Saliency Prediction

Authors: Bahar Aydemir, Deblina Bhattacharjee, Tong Zhang, Seungryong Kim, Mathieu Salzmann, Sabine Süsstrunk

TMLR 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental As evidenced by our experiments, this consistently boosts the accuracy of the baseline networks, enabling us to outperform the state-of-the-art models on three saliency benchmarks, namely SALICON, MIT300 and CAT2000. Our experiments on the SALICON (Jiang et al., 2015), MIT1003 (Judd et al., 2009) and CAT2000 (Borji & Itti, 2015) benchmarks demonstrate that our approach consistently improves the results of the baseline saliency networks we build on
Researcher Affiliation Academia 1School of Computer and Communication Sciences, EPFL, Switzerland, 2Department of Computer Science and Engineering, Korea University, South Korea
Pseudocode No The paper describes its methodology through architectural diagrams (Figure 3) and textual descriptions in Section 3 "Methodology", but no explicit pseudocode or algorithm blocks are provided.
Open Source Code Yes Our project page is at https://github.com/IVRL/Dis Sal. We will make our code publicly available. We implemented our approach using Pytorch and will make our code publicly available.
Open Datasets Yes We report the performance of our methods on three publicly available saliency detection benchmarks. We train our models on 10,000 images of the SALICON (Jiang et al., 2015) dataset, which consists of diverse context-rich images from the MS COCO dataset (Lin et al., 2014). We also fine-tune our SALICON-trained models on the MIT1003 dataset (Judd et al., 2009). In addition, we fine-tune our model on the CAT2000 (Borji & Itti, 2015) dataset.
Dataset Splits Yes The dataset contains 10,000 training, 5,000 validation, and 5,000 test images, which makes it the largest saliency detection dataset to date. We also fine-tune our SALICON-trained models on the MIT1003 dataset (Judd et al., 2009), which consists of 1003 everyday scenes collected from Flickr and Label Me, and evaluate them on the commonly used validation partition of MIT1003, and on the official MIT300 test set, which contains 300 natural images. In addition, we fine-tune our model on the CAT2000 (Borji & Itti, 2015) dataset, which comprises 2000 training and 2000 test images organized in 20 diverse categories. For CAT2000, we use 125 and 50 images across 20 categories to fine-tune and validate our model, respectively.
Hardware Specification Yes We use two V100, 7 Tflops GPUs with 32 GB memory.
Software Dependencies No The paper mentions implementing the approach using Pytorch and using the Adam optimizer, but it does not specify any version numbers for these software components or any other libraries.
Experiment Setup Yes During training, we resize all images to 480x640 for the global saliency prediction branch and 300x300 for the object detection one. We use random orthogonal initialization for the decoder layers. Furthermore, we use the Adam optimizer to train the global saliency branch, with an initial learning rate of 10 4. We set the batch size to 2. We validate the network after each epoch and select the best model from the validation phase to avoid over-fitting. When fine-tuning on MIT1003, we use a batch size of 2 and an initial learning rate of 10 5. We also initialize our global saliency branch based on the current state-of-the-art model on the MIT/Tuebingen benchmark, namely UNISAL (Droste et al., 2020), with parameters provided by the authors of (Droste et al., 2020).