Human Activity Recognition in an Open World

Authors: Derek S. Prijatelj, Samuel Grieggs, Jin Huang, Dawei Du, Ameya Shringi, Christopher Funk, Adam Kaufman, Eric Robertson, Walter J. Scheirer

JAIR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental This paper 1) formalizes the definition of novelty in HAR building upon the prior definition of novelty in classification tasks, 2) proposes an incremental open world learning (OWL) protocol and applies it to the Kinetics datasets to generate a new benchmark KOWL-718, 3) analyzes the performance of current stateof-the-art HAR models when novelty is introduced over time, 4) provides a containerized and packaged pipeline for reproducing the OWL protocol and for modifying for any future updates to Kinetics. The experimental analysis includes an ablation study of how the different models perform under various conditions as annotated by Kinetics-AVA.
Researcher Affiliation Collaboration Derek S. Prijatelj EMAIL Department of Computer Science and Engineering University of Notre Dame Notre Dame, IN 46556, USA Samuel Grieggs EMAIL Department of Mathematical and Computer Sciences Indiana University of Pennsylvania Indiana, PA 15705, USA Jin Huang EMAIL Department of Computer Science and Engineering University of Notre Dame Notre Dame, IN 46556, USA Dawei Du EMAIL Ameya Shringi EMAIL Christopher Funk EMAIL Kitware, Inc. 1712 Route 9, Suite 300, Clifton Park, NY 12065, USA Adam Kaufman EMAIL Eric Robertson EMAIL PAR Government 421 Ridge St, Rome, NY 13440, USA Walter J. Scheirer EMAIL Department of Computer Science and Engineering University of Notre Dame Notre Dame, IN 46556, USA
Pseudocode No The paper describes methods and processes in detail, including a protocol for Open World Learning for HAR, but it does not present any of these methods or procedures in a formal pseudocode block or algorithm listing. For example, Section 4 'Open World Learning Protocol for HAR' describes steps but not in pseudocode.
Open Source Code Yes 4. Pipeline code for configuring and running the open world HAR experiments1 with a Docker image to both reproduce the Kinetics experiments below, as well as enable any configuration for open world Kinetics experiments in the future. 1. Code repository with containers for reproducing and extending: https://github.com/prijatelj/ human-activity-recognition-in-an-open-world
Open Datasets Yes 2. An OWL protocol in Section 4 to create experiments to analyze novelty in HAR datasets, applied to Kinetics 400 (Kay et al., 2017), 600 (Carreira et al., 2018), and 700-2020 (Smaira et al., 2020) to create KOWL-718, accompanied by baseline predictors for benchmarking; Appendix C.1 Data Specifics: Collecting the Kinetics datasets is unfortunately not straight forward for such a standard benchmark. The data is officially distributed by Google as a list of You Tube IDs. A large portion of the videos has fallen victim to link rot where the links are broken and the videos are no longer available, especially in the older Kinetics 400 and 600 sets. Fortunately, there are at least two archives of this data that offer a more complete version of the dataset, a torrent for Kinetics-7002 and Kinetics-4003, and a repository hosted by the CVDF4. This paper used the CVDF repository.
Dataset Splits Yes The validation and testing data splits are similarly be split using the same class sets per increment such that they are aligned to the training data. Given the number of desired increments N, a labeled dataset that is an unordered sequence of pairs (x, y), and the starting known label set K Y, then the starting unknown label set U consists of the unique labels not in K, where U = {y : y Y, y / K}. The pairs with known labels may be split into the N increments in any way desired, such as a stratified split that maintains the balance of the known labels across the increments as much as possible. ... If training, validation, and testing splits are not predefined, each increment from above may be split using the typical methodologies, such as 8:2 train-to-test ratios, or even multi-fold cross validation. ... Table 1: KOWL-718 s total known and novel classes with their samples per increment using the configuration in Section 5: Most recent label first, first come first assigned data split, and most frequent novel class first per future dataset. Increment 0 uses the Kinetics-400 data splits without the 18,016 validation samples listed above. The test split consists of Kinetics-600 s validation and test splits for the increments in range [1, 5] and the Kinetics-700 validation split for increments in range [6, 10] given the Kinetics-700-2020 test set labels are not yet publicly released. Totaling 630,524 training samples and 68,849 test samples overall. *See Appendix C.1 for a breakdown and a note on link rot where videos are lost.
Hardware Specification Yes Appendix C.2.3 HARDWARE, RESOURCES, AND COMPUTATIONAL RUNTIMES These models were run on a variety of compute resources. In order to facilitate experimentation, we first ran feature extraction on the entire unified datasets (KOWL-718) with both the X3D and Time Sformer models. A full run of the feature extraction takes approximately 30 hours for the X3D feature extractor and approximately 23 days for the Time Sformer feature extractor when using an Nvidia Quadro RTX 6000.
Software Dependencies No The X3D (Feichtenhofer, 2020) model used in this work uses a pure pytorch implementaiton of X3D Multigrid (Wu, Girshick, He, Feichtenhofer, & Krahenbuhl, 2020), found at: https://github.com/kkahatapitiya/X3D-Multigrid. ... The Time Sformer feature extractor used in this work was adapted from the Kinetics-400 reference model distributed by (Bertasius et al., 2021) from https://github.com/facebookresearch/Time Sformer. ... While the paper mentions PyTorch and references specific implementations, it does not explicitly state version numbers for these or other software dependencies.
Experiment Setup Yes The fine-tuned classifier that serves as our demonstrative baseline for this OWL HAR benchmark is a single fully connected layer to the feature representation that then outputs to the final softmax classification layer, denoted as ANN . The fine-tuned classifier is the predictor s only state updated during the incremental learning. ... The ANNs used for both of the feature representation models was a fully connected dense network of one hidden layer with a Leaky Re Lu activation that then went to the softmax classsifier layer which was sized as known classes + 1 for the unknown catch-all class. In experimentation to find a suitable ANN architecture, multiple hidden layers and hidden layer widths were examined, including multiple layers with skip connections through concatenation. In the end, the single hidden layer model with the same size as the feature representation vector was found to be the most performant, using a dropout during training of 0.5 probability. We also explored different numbers of total training epochs ranging from [1, 100]. We found that 100 epochs would result in overfitting, as seen in Fig. 16, which compares the performance of the Finetuned ANN for 100 epochs versus the original logit predictions of the feature representation s original classifier layer output. The runtime of the single epoch ANN was approximately 1 half-hour of wall time for the incremental learning experiment at the cost of an estimated 0.05 MCC performance loss based on the training run on only the Kinetics-400 data.