reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

ZeroHAR: Sensor Context Augments Zero-Shot Wearable Action Recognition

Authors: Ranak Roy Chowdhury, Ritvik Kapila, Ameya Panse, Xiyuan Zhang, Diyan Teng, Rashmi Kulkarni, Dezhi Hong, Rajesh K. Gupta, Jingbo Shang

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We tested our method against eight baselines on five benchmark HAR datasets with various sensors, placements, and activities. Our model shows exceptional generalizability across the 18 motion time series classification benchmark datasets, outperforming the best baselines by 262% in the zero-shot setting. We extensively evaluate Zero HAR with 12 baselines on 18 benchmark HAR datasets, covering a wide variety of number and type of IMU sensors and range of human motions. Zero HAR resulted in a 262% average improvement in Zero-Shot Accuracy over the 2nd best results.
Researcher Affiliation	Collaboration	Ranak Roy Chowdhury1, Ritvik Kapila1, Ameya Panse1, Xiyuan Zhang1, Diyan Teng2, Rashmi Kulkarni2, Dezhi Hong3*, Rajesh K. Gupta1, Jingbo Shang1 1 University of California San Diego 2 Qualcomm 3 Amazon EMAIL, EMAIL, EMAIL
Pseudocode	Yes	Algorithm 1: Stage I: Motion with Sensor Context Learning Input: Dtr, B, M, W, ILM Hyper-parameters: τ Output: Trained (K and R)... Algorithm 2: Stage II: Action Recognition Input: Dtr, G, LLM, ILM, Trained K, P, R from Stage I Hyper-parameters: c Output: Trained (K, P, and R)
Open Source Code	No	The paper does not contain any explicit statement about providing source code, nor does it provide a link to a code repository or mention code in supplementary materials.
Open Datasets	Yes	We evaluate on a comprehensive motion time series classification benchmark, comprising of 18 real-world datasets that cover diverse activities. These datasets are collected from various body locations such as head, chest, back, arm, wrist, waist, hip, leg, knee and ankle. We categorize these datasets into three difficulty levels: (1) easy level (with fewer than 10 activities): Opportunity (Roggen et al. 2010), UCI-HAR (Anguita et al. 2013), Motion Sense (Malekzadeh et al. 2019), w-HAR (Showail 2022), Shoaib (Shoaib et al. 2014), HAR70+ (Ustad et al. 2023), Real World (Sztyler and Stuckenschmidt 2016), TNDA-HAR (Yang et al. 2024); (2) medium level (with 10 to 20 activities): PAMAP2 (Reiss and Stricker 2012), USC-HAD (Zhang and Sawchuk 2012), Mhealth (Oresti et al. 2014), Harth (Logacjov et al. 2021), UT-Complex (Shoaib et al. 2016), Wharf (Bruno et al. 2013), WISDM (Weiss 2019), DSADS (Altun, Barshan, and Tunc el 2010); (3) hard level (with more than 20 activities): UTD-MHAD (Chen, Jafari, and Kehtarnavaz 2015), MMAct (Kong et al. 2019).
Dataset Splits	No	Fig 3 illustrates the train, validation, and test sets for Zero HAR. The test set contains novel classes, U, unseen during training. To enable early stopping, we reserve data from novel classes, O, to form a validation set, Ova, at the start of training. During Stage II, Zero HAR trains only on Otr, where Otr = O Ova. Additionally, part of Otr is reserved for Stage I validation. Each row represents #samples and column represent #classes. Note that #samples in each set may be different. The paper describes the strategy for creating splits but does not provide specific percentages or absolute counts for training, validation, and test sets.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running its experiments.
Software Dependencies	No	The paper mentions models like GPT-4, Image Bind, BERT, and GPT, and also Adam optimizers, but does not provide specific version numbers for the general software environment, programming languages, or libraries used for implementation (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	We set the temperature parameter τ, in Algorithm 1, to 0.05 and the number of descriptions per activity, c, in Algorithm 2 to 10. We use Adam optimizers to update the IMU modality (IMU Encoder and IMU projectors P and Q for Stage I and II, respectively) and the text modality (text projector R). We use a batch size of 128, learning rate of 0.001, 8 self-attention layers with 8 heads for the IMU Encoder, a dropout of 0.01 and a hidden dimension, h, of 128, for both Stage I and II. We save the model with the lowest validation loss and evaluate it on the test set.