Modulating early visual processing by language

Authors: Harm de Vries, Florian Strub, Jeremie Mary, Hugo Larochelle, Olivier Pietquin, Aaron C. Courville

NeurIPS 2017 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We apply CBN to a pre-trained Residual Network (Res Net), leading to the MODulat Ed Res Net (MODERN) architecture, and show that this significantly improves strong baselines on two visual question answering tasks. Our ablation study confirms that modulating from the early stages of the visual processing is beneficial.
Researcher Affiliation Collaboration Harm de Vries University of Montreal EMAIL Florian Strub Univ. Lille, CNRS, Centrale Lille, Inria, UMR 9189 CRISt AL EMAIL Jérémie Mary Univ. Lille, CNRS, Centrale Lille, Inria, UMR 9189 CRISt AL EMAIL Hugo Larochelle Google Brain EMAIL Olivier Pietquin Deep Mind EMAIL Aaron Courville University of Montreal EMAIL
Pseudocode No No explicit pseudocode or algorithm blocks are provided.
Open Source Code Yes The source code for our experiments is available at https://github.com/Guess What Game.
Open Datasets Yes In this paper, we focus on VQAv1 dataset [1], which contains 614K questions on 204K images.
Dataset Splits Yes We train on the training set, do early-stopping on the validation set, and report the accuracies on the test-dev using the evaluation script provided by [1].
Hardware Specification Yes We thank NVIDIA for providing access to a DGX-1 machine used in this work.
Software Dependencies No The paper mentions software components like LSTM, GRU, and ResNet, but does not provide specific version numbers for any software libraries or frameworks used.
Experiment Setup Yes The hyperparameters are also provided in Appendix A.