reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Human Activity Recognition in an Open World

Authors: Derek S. Prijatelj, Samuel Grieggs, Jin Huang, Dawei Du, Ameya Shringi, Christopher Funk, Adam Kaufman, Eric Robertson, Walter J. Scheirer

JAIR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	This paper 1) formalizes the deﬁnition of novelty in HAR building upon the prior deﬁnition of novelty in classiﬁcation tasks, 2) proposes an incremental open world learning (OWL) protocol and applies it to the Kinetics datasets to generate a new benchmark KOWL-718, 3) analyzes the performance of current stateof-the-art HAR models when novelty is introduced over time, 4) provides a containerized and packaged pipeline for reproducing the OWL protocol and for modifying for any future updates to Kinetics. The experimental analysis includes an ablation study of how the different models perform under various conditions as annotated by Kinetics-AVA.
Researcher Affiliation	Collaboration	Derek S. Prijatelj EMAIL Department of Computer Science and Engineering University of Notre Dame Notre Dame, IN 46556, USA Samuel Grieggs EMAIL Department of Mathematical and Computer Sciences Indiana University of Pennsylvania Indiana, PA 15705, USA Jin Huang EMAIL Department of Computer Science and Engineering University of Notre Dame Notre Dame, IN 46556, USA Dawei Du EMAIL Ameya Shringi EMAIL Christopher Funk EMAIL Kitware, Inc. 1712 Route 9, Suite 300, Clifton Park, NY 12065, USA Adam Kaufman EMAIL Eric Robertson EMAIL PAR Government 421 Ridge St, Rome, NY 13440, USA Walter J. Scheirer EMAIL Department of Computer Science and Engineering University of Notre Dame Notre Dame, IN 46556, USA
Pseudocode	No	The paper describes methods and processes in detail, including a protocol for Open World Learning for HAR, but it does not present any of these methods or procedures in a formal pseudocode block or algorithm listing. For example, Section 4 'Open World Learning Protocol for HAR' describes steps but not in pseudocode.
Open Source Code	Yes	4. Pipeline code for conﬁguring and running the open world HAR experiments1 with a Docker image to both reproduce the Kinetics experiments below, as well as enable any conﬁguration for open world Kinetics experiments in the future. 1. Code repository with containers for reproducing and extending: https://github.com/prijatelj/ human-activity-recognition-in-an-open-world
Open Datasets	Yes	2. An OWL protocol in Section 4 to create experiments to analyze novelty in HAR datasets, applied to Kinetics 400 (Kay et al., 2017), 600 (Carreira et al., 2018), and 700-2020 (Smaira et al., 2020) to create KOWL-718, accompanied by baseline predictors for benchmarking; Appendix C.1 Data Speciﬁcs: Collecting the Kinetics datasets is unfortunately not straight forward for such a standard benchmark. The data is ofﬁcially distributed by Google as a list of You Tube IDs. A large portion of the videos has fallen victim to link rot where the links are broken and the videos are no longer available, especially in the older Kinetics 400 and 600 sets. Fortunately, there are at least two archives of this data that offer a more complete version of the dataset, a torrent for Kinetics-7002 and Kinetics-4003, and a repository hosted by the CVDF4. This paper used the CVDF repository.
Dataset Splits	Yes	The validation and testing data splits are similarly be split using the same class sets per increment such that they are aligned to the training data. Given the number of desired increments N, a labeled dataset that is an unordered sequence of pairs (x, y), and the starting known label set K Y, then the starting unknown label set U consists of the unique labels not in K, where U = {y : y Y, y / K}. The pairs with known labels may be split into the N increments in any way desired, such as a stratiﬁed split that maintains the balance of the known labels across the increments as much as possible. ... If training, validation, and testing splits are not predeﬁned, each increment from above may be split using the typical methodologies, such as 8:2 train-to-test ratios, or even multi-fold cross validation. ... Table 1: KOWL-718 s total known and novel classes with their samples per increment using the conﬁguration in Section 5: Most recent label ﬁrst, ﬁrst come ﬁrst assigned data split, and most frequent novel class ﬁrst per future dataset. Increment 0 uses the Kinetics-400 data splits without the 18,016 validation samples listed above. The test split consists of Kinetics-600 s validation and test splits for the increments in range [1, 5] and the Kinetics-700 validation split for increments in range [6, 10] given the Kinetics-700-2020 test set labels are not yet publicly released. Totaling 630,524 training samples and 68,849 test samples overall. *See Appendix C.1 for a breakdown and a note on link rot where videos are lost.
Hardware Specification	Yes	Appendix C.2.3 HARDWARE, RESOURCES, AND COMPUTATIONAL RUNTIMES These models were run on a variety of compute resources. In order to facilitate experimentation, we ﬁrst ran feature extraction on the entire uniﬁed datasets (KOWL-718) with both the X3D and Time Sformer models. A full run of the feature extraction takes approximately 30 hours for the X3D feature extractor and approximately 23 days for the Time Sformer feature extractor when using an Nvidia Quadro RTX 6000.
Software Dependencies	No	The X3D (Feichtenhofer, 2020) model used in this work uses a pure pytorch implementaiton of X3D Multigrid (Wu, Girshick, He, Feichtenhofer, & Krahenbuhl, 2020), found at: https://github.com/kkahatapitiya/X3D-Multigrid. ... The Time Sformer feature extractor used in this work was adapted from the Kinetics-400 reference model distributed by (Bertasius et al., 2021) from https://github.com/facebookresearch/Time Sformer. ... While the paper mentions PyTorch and references specific implementations, it does not explicitly state version numbers for these or other software dependencies.
Experiment Setup	Yes	The ﬁne-tuned classiﬁer that serves as our demonstrative baseline for this OWL HAR benchmark is a single fully connected layer to the feature representation that then outputs to the ﬁnal softmax classiﬁcation layer, denoted as ANN . The ﬁne-tuned classiﬁer is the predictor s only state updated during the incremental learning. ... The ANNs used for both of the feature representation models was a fully connected dense network of one hidden layer with a Leaky Re Lu activation that then went to the softmax classsiﬁer layer which was sized as known classes + 1 for the unknown catch-all class. In experimentation to ﬁnd a suitable ANN architecture, multiple hidden layers and hidden layer widths were examined, including multiple layers with skip connections through concatenation. In the end, the single hidden layer model with the same size as the feature representation vector was found to be the most performant, using a dropout during training of 0.5 probability. We also explored different numbers of total training epochs ranging from [1, 100]. We found that 100 epochs would result in overﬁtting, as seen in Fig. 16, which compares the performance of the Finetuned ANN for 100 epochs versus the original logit predictions of the feature representation s original classiﬁer layer output. The runtime of the single epoch ANN was approximately 1 half-hour of wall time for the incremental learning experiment at the cost of an estimated 0.05 MCC performance loss based on the training run on only the Kinetics-400 data.