A^2-Nets: Double Attention Networks
Authors: Yunpeng Chen, Yannis Kalantidis, Jianshu Li, Shuicheng Yan, Jiashi Feng
NeurIPS 2018 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive ablation studies and experiments on both image and video recognition tasks for evaluating its performance. |
| Researcher Affiliation | Collaboration | Yunpeng Chen National University of Singapore EMAIL Yannis Kalantidis Facebook Research EMAIL Jianshu Li National University of Singapore EMAIL Shuicheng Yan Qihoo 360 AI Institute National University of Singapore EMAIL Jiashi Feng National University of Singapore EMAIL |
| Pseudocode | No | The paper provides a computational graph in Figure 2, but it does not include pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | No | Code and trained models will be released on Git Hub soon. |
| Open Datasets | Yes | Kinetics [12] video recognition dataset, Image Net-1k [13] image classification dataset, UCF-101 [20] |
| Dataset Splits | Yes | For image classification, we report standard single model single 224 224 center crop validation accuracy, following [9, 10]. The UCF-101 contains about 13, 320 videos from 101 action categories and has three train/test splits. |
| Hardware Specification | Yes | All experiments are conducted using a distributed K80 GPU cluster |
| Software Dependencies | No | We use MXNet [3] to experiment on the image classification task, and Py Torch [18] on video classification tasks. The paper mentions the names of the software used but does not specify their version numbers. |
| Experiment Setup | Yes | The base learning rate is set to 0.2 and is reduced with a factor of 0.1 at the 20k-th, 30k-th iterations, and terminated at the 37k-th iteration. We use 32 GPUs per experiment with a total batch size of 512 training from scratch. The base learning rate is set to 0.1 and decreases with a factor of 0.1 when training accuracy is saturated. The network takes 8 frames (sampling stride: 8) as input and is trained for 32k iterations with a total batch size of 512 using 64 GPUs. The initial learning rate is set to 0.04 and decreased in a stepwise manner when training accuracy is saturated. |