Unsupervised Learning of View-invariant Action Representations
Authors: Junnan Li, Yongkang Wong, Qi Zhao, Mohan S. Kankanhalli
NeurIPS 2018 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the effectiveness of our learned representations for action recognition on multiple datasets. Our method outperforms state-of-the-art unsupervised methods across multiple datasets. 4 Experiments |
| Researcher Affiliation | Academia | Junnan Li Grad. School for Integrative Sciences and Engineering National University of Singapore Singapore EMAIL Yongkang Wong School of Computing National University of Singapore Singapore EMAIL Qi Zhao Dept. of Computer Science and Engineering University of Minnesota Minneapolis, USA EMAIL Mohan S. Kankanhalli School of Computing National University of Singapore Singapore EMAIL |
| Pseudocode | No | The paper describes the components of the learning framework and the optimization process but does not include any pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide a specific link or explicit statement about releasing the source code for their methodology. |
| Open Datasets | Yes | We use the NTU RGB+D dataset [49] for unsupervised representation learning. |
| Dataset Splits | Yes | For cross-subject evaluation, we follow the same training and testing split as in [49]. For cross-view evaluation, samples of cameras 2 and 3 are used for training while those of camera 1 for testing. Since we need at least two cameras for our unsupervised task, we randomly divide the supervised training set with ratio of 8:1 for unsupervised training and test. |
| Hardware Specification | No | The paper describes computational parameters such as mini-batch size and optimizer settings but does not provide specific details on the hardware (e.g., GPU/CPU models, memory) used for running experiments. |
| Software Dependencies | No | The paper mentions deep learning architectures and optimizers (e.g., ResNet-18, Bi-directional convolutional LSTM, Adam optimizer) but does not provide specific version numbers for software dependencies or libraries. |
| Experiment Setup | Yes | Implementation details. For Conv in encoder and depth CNN in cross-view decoder, we employ the Res Net-18 architecture [15] up until the final convolution layer, and add a 1 1 64 convolutional layer to reduce the feature size. ... For Bi LSTM, we use convolutional filters of size 7 7 64 for convolution with input and hidden state. We initialize all weights following the method in [14]. During training, we use a mini-batch of size 8. We train the model using the Adam optimizer [20], with an initial learning rate of 1e 5 and a weight decay of 5e 4. We decrease the learning rate by half every 20000 steps (mini-batches). To avoid distracting the flow prediction task, we activate the view adversarial training after 5000 steps. The weights of the loss terms are set as α = 0.5 and β = 0.05, which is determined via cross-validation. |