On the Behavior of Convolutional Nets for Feature Extraction

Authors: Dario Garcia-Gasulla, Ferran Parés, Armand Vilalta, Jonatan Moreno, Eduard Ayguadé, Jesús Labarta, Ulises Cortés, Toyotaro Suzumura

JAIR 2018 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this paper we statistically measure the discriminative power of every single feature found within a deep CNN, when used for characterizing every class of 11 datasets. We seek to provide new insights into the behavior of CNN features, particularly the ones from convolutional layers, as this can be relevant for their application to knowledge representation and reasoning. Our results confirm that low and middle level features may behave differently to high level features, but only under certain conditions.
Researcher Affiliation Collaboration Dario Garcia-Gasulla EMAIL Ferran Par es EMAIL Armand Vilalta EMAIL Jonatan Moreno Barcelona Supercomputing Center (BSC) Jordi Girona, 1-3 08034 Barcelona, Catalonia Eduard Ayguad e Jes us Labarta Ulises Cort es Barcelona Supercomputing Center (BSC) Universitat Polit ecnica de Catalunya Barcelona Tech (UPC) Toyotaro Suzumura IBM T.J. Watson Research Center, New York, USA Barcelona Supercomputing Center (BSC)
Pseudocode No The paper describes methods in prose and mathematical formulas, but does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks, nor structured steps formatted like code.
Open Source Code No The paper mentions that the VGG16 and VGG19 models (which are third-party models used by the authors) are publicly available at the authors' webpage (footnote 1: 'http://www.robots.ox.ac.uk/vgg/research/very_deep/'). However, there is no explicit statement about releasing the source code for the methodology described in *this* paper, nor is there a direct link to a code repository for their own work.
Open Datasets Yes To completely specify TS one needs the label space Y but also an objective predictive function f( ). In our case the function f( ) is defined by a trained CNN (i.e., its architecture and parameters). There are many popular CNN architectures, and various have been used for feature extraction (see Section2). Since our goal is to explore the behavior of convolutional layers in the feature extraction process, we will use an architecture which follows the most canonical scheme of layers (i.e., conv/pool/conv/pool/.../fc). At the same time, we wish to use a model capable of learning a rich representation language at various levels (i.e., a very deep network). This combination of requirements leads us to use the VGG16 architecture as source of features (Simonyan & Zisserman, 2014). VGG16 is composed by 13 convolutional layers (with 5 pooling layers) and 3 fully-connected layers (see Table 1 for details on the architecture). The only exception are Figures 2, 3, 4 and 7 which are obtained using the VGG19 architecture, and used here only for illustrative purposes. This architecture is from the same authors, and detailed on the same paper. It only differs from VGG16 by having 3 extra convolutional layers conv3 4, conv4 4 and conv5 4. Results obtained with the VGG19 architecture were consistent with the ones obtained with VGG16 for all experiments. Both models are publicly available at the authors web page1. Once we have defined the source task and domain (TS, DS), let us introduce the publicly available datasets we will consider as target (TT , DT ) in our study on transfer learning: 1. The MIT Indoor Scene Recognition dataset (Quattoni & Torralba, 2009) (mit67) consists of different indoor scenes of 67 categories. [...] 11. We also use the validation split of Image Net 2012 (Russakovsky et al., 2015) (imagenet) as a target problem for comparison purposes.
Dataset Splits Yes In our experiments we do not train models using these datasets, which means we do not require the provided train and test splits. Instead, we will merge both splits to make use of all the data available.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU models, CPU types, memory amounts) used to conduct the experiments.
Software Dependencies No The paper mentions using VGG16 and VGG19 architectures, and discusses other CNN models like AlexNet, OverFeat, and ResNets. However, it does not specify any particular software frameworks (e.g., TensorFlow, PyTorch) or their versions, nor any other library versions used for implementation or analysis.
Experiment Setup Yes In our case, we will use two main parameters which we consider to be coherent with our study. First, each image representation will be built as a result of processing 10 crops of the image (4 corners and middle crop, mirrored) through the CNN and averaging the resulting activations. This is a frequently used methodology for feature extraction (Sharif Razavian et al., 2014; Azizpour et al., 2016), which provides robustness to the resultant representations. Second, we perform a spatial average pooling of each convolutional layer to obtain a single value per filter.