Fast and Expressive Gesture Recognition using a Combination-Homomorphic Electromyogram Encoder
Authors: Niklas Smedemark-Margulies, Yunus Bicer, Elifnur Sunger, Tales Imbiriba, Eugene Tunik, Deniz Erdogmus, Mathew Yarossi, Robin Walters
TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To evaluate the proposed method, we collect and release a real-world EMG dataset, and measure the effect of augmented supervision against two baselines: a partially-supervised model trained with only single gesture data from the unseen subject, and a fully-supervised model trained with real single and real combination gesture data from the unseen subject. We find that the proposed method provides a dramatic improvement over the partially-supervised model, and achieves a useful classification accuracy that in some cases approaches the performance of the fully-supervised model. |
| Researcher Affiliation | Academia | 1Khoury College of Computer Sciences, Northeastern University 2Department of Electrical and Computer Engineering, Northeastern University 3Department of Physical Therapy, Movement, and Rehabilitation Sciences, Northeastern University |
| Pseudocode | No | The paper includes figures (e.g., Figure 1, 2, 3) illustrating model architectures and processes, and descriptions of methods in text, but it does not contain any explicitly labeled pseudocode blocks or algorithms in a structured, code-like format. |
| Open Source Code | Yes | Code to reproduce all experiments can be found at https://github.com/nik-sm/com-hom-emg |
| Open Datasets | Yes | Dataset can be downloaded at https://zenodo.org/doi/10.5281/zenodo.10291624 |
| Dataset Splits | Yes | For each choice of model hyperparameters, we repeat model pretraining and evaluation 50 times; this includes 10-fold cross-validation and 5 random seeds. During the calibration and test phase, the encoder is frozen, as shown in Figure 3. In each cross-validation fold, the pretraining dataset DP re contains data from 9 subjects; 1 of these 9 is used for early stopping based on validation performance. Data from a 10th subject is divided in a stratified 80/20 split to form the calibration and test datasets DCalib and DT est. Specifically, this 80/20 split is stratified as follows. We use the encoder to obtain real feature vectors from all of the test subject s data Zdir Zmod Zcomb, and divide each of these portions of data to obtain a calibration dataset and a test dataset. |
| Hardware Specification | No | The paper describes the EMG data acquisition hardware: "Surface electromyography (s EMG, Trigno, Delsys Inc., 99.99% Ag electrodes, 1926 Hz sampling frequency, common mode rejection ratio: > 80 d B, built-in 20 450 Hz bandpass filter) was recorded from 8 electrodes attached to the right forearm with adhesive tape." However, it does not specify any hardware details (e.g., GPU, CPU models, or cloud resources) used for model training or inference. |
| Software Dependencies | No | The encoder model FθF is implemented in Py Torch (Paszke et al., 2019). When training a fresh classifier GT est θG for the unseen test subject, we use Random Forest implemented in Scikit-Learn (Pedregosa et al., 2011). Custom software for data acquisition was developed using the Lab Graph (Feng et al., 2021) Python package. Specific version numbers for PyTorch, Scikit-Learn, or Python itself are not provided. |
| Experiment Setup | Yes | The encoder is pre-trained for 300 epochs of gradient descent using the Adam W optimizer (Loshchilov and Hutter, 2017) with default values of β1 = 0.9 and β2 = 0.999 and a fixed learning rate of 0.0003. The encoder s output feature dimension K is set to 64. When performing augmented training, we first create all possible synthetic items using Combine All Pairs, and then we select a random N = 500 items for each of the 16 combination gesture classes. In all experiments, we set the triplet margin parameter γ = 1.0. When using the basic triplet loss strategy, we sample N = 3 random triplets without replacement for each item. When using the centroids triplet loss strategy, we use a momentum value of M = 0.9 to update the centroids. In each training batch, we consider each class of data present, and add freshly sampled white Gaussian noise such that the signal-to-noise (SNR) ratio of the data is roughly 20 decibels. |