Domain-Adversarial Training of Neural Networks

Authors: Yaroslav Ganin, Evgeniya Ustinova, Hana Ajakan, Pascal Germain, Hugo Larochelle, François Laviolette, Mario March, Victor Lempitsky

JMLR 2016 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the success of our approach for two distinct classification problems (document sentiment analysis and image classification), where state-of-the-art domain adaptation performance on standard benchmarks is achieved. We also validate the approach for descriptor learning task in the context of person re-identification application.
Researcher Affiliation Academia Yaroslav Ganin EMAIL Skolkovo Institute of Science and Technology (Skoltech) Skolkovo, Moscow Region, Russia Hana Ajakan EMAIL D epartement d informatique et de g enie logiciel, Universit e Laval Qu ebec, Canada, G1V 0A6 Hugo Larochelle EMAIL D epartement d informatique, Universit e de Sherbrooke Qu ebec, Canada, J1K 2R1
Pseudocode Yes Algorithm 1 Shallow DANN Stochastic training update
Open Source Code Yes Note that we release the source code for the Gradient Reversal layer along with the usage examples as an extension to Caffe (Jia et al., 2014).4 4. http://sites.skoltech.ru/compvision/projects/grl/
Open Datasets Yes We compare the algorithms on the Amazon reviews data set, as pre-processed by Chen et al. (2012). This data set includes four domains, each one composed of reviews of a specific kind of product (books, dvd disks, electronics, and kitchen appliances). ...results on traditional deep learning image data sets such as MNIST (Le Cun et al., 1998) and SVHN (Netzer et al., 2011) as well as on Office benchmarks (Saenko et al., 2010) ...we use PRID (Hirzer et al., 2011), VIPe R (Gray et al., 2007), CUHK (Li and Wang, 2013) as target data sets for our experiments.
Dataset Splits Yes Given the labeled source sample S and the unlabeled target sample T, we split each set into training sets (S and T respectively, containing 90% of the original examples) and the validation sets (SV and TV respectively). All learning algorithms are given 2 000 labeled source examples and 2 000 unlabeled target examples. Then, we evaluate them on separate target test sets (between 3 000 and 6 000 examples). For VIPe R, we use random 316 persons for training and all others for testing. For CUHK, 971 persons are split into 485 for training and 486 for testing.
Hardware Specification No Computations were performed on the Colosse supercomputer grid at Universit e Laval, under the auspices of Calcul Qu ebec and Compute Canada. The operations of Colosse are funded by the NSERC, the Canada Foundation for Innovation (CFI), Nano Qu ebec, and the Fonds de recherche du Qu ebec Nature et technologies (FRQNT).
Software Dependencies No Note that we release the source code for the Gradient Reversal layer along with the usage examples as an extension to Caffe (Jia et al., 2014).
Experiment Setup Yes For the DANN algorithm, the adaptation parameter λ is chosen among 9 values between 10 2 and 1 on a logarithmic scale. The hidden layer size l is either 50 or 100. Finally, the learning rate µ is fixed at 10 3. The learning rate is adjusted during the stochastic gradient descent using the following formula: µp = µ0 (1 + α p)β , where p is the training progress linearly changing from 0 to 1, µ0 = 0.01, α = 10 and β = 0.75 (the schedule was optimized to promote convergence and low error on the source domain). A momentum term of 0.9 is also used. The domain adaptation parameter λ is initiated at 0 and is gradually changed to 1 using the following schedule: λp = 2 / (1 + exp( γ p)) - 1 , where γ was set to 10 in all experiments (the schedule was not optimized/tweaked). Finally, note that the model is trained on 128-sized batches (images are preprocessed by the mean subtraction).