reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

CiTrus: Squeezing Extra Performance out of Low-data Bio-signal Transfer Learning

Authors: Eloy Geenjaar, Lie Lu

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper, we propose a new convolution-transformer hybrid model architecture with masked auto-encoding for low-data bio-signal transfer learning, introduce a frequency-based masked auto-encoding task, employ a more comprehensive evaluation framework, and evaluate how much and when (multimodal) pre-training improves fine-tuning performance. Our findings indicate that the convolution-only part of our hybrid model can achieve stateof-the-art performance on some low-data downstream tasks.
Researcher Affiliation	Collaboration	Eloy Geenjaar1*, Lie Lu2 1Georgia Institute of Technology 2Dolby Laboratories EMAIL, EMAIL
Pseudocode	No	The paper describes methods like masked autoencoding and a convolution-transformer hybrid (Ci Trus) through textual descriptions and diagrams (Figure 1), but no explicitly labeled pseudocode or algorithm blocks are provided.
Open Source Code	No	The paper states: "The implementations for the bio FAME, Sim MTM, and Patch TST models are taken from their respective official implementations." This refers to external implementations used as baselines, not the authors' own code for Ci Trus. No explicit statement of code release or repository link for the methodology described in this paper is provided.
Open Datasets	Yes	Similar to previous work (Zhang et al. 2022), we use Sleep EDF, a large sleep EEG dataset (Kemp et al. 2000), for pre-training. An electromyography (EMG) dataset (Goldberger et al. 2000) with 375ms windows, sampled at 4KHz, and a single channel. A gesture recognition dataset (Liu et al. 2009) with 3.15s windows, sampled at 100Hz, and 3 channels. An electromotor fault-detection (FD-B) (Lessmeier et al. 2016) dataset with 80ms windows, sampled at 64KHz, and a single channel. A photoplethysmography (PPG) dataset (Schmidt et al. 2018) with 60s windows, sampled at 64Hz, and a single channel. The HAR dataset (Reyes-Ortiz et al. 2015) with 2.56s windows length, sampled at 50Hz, and 6 channels. An electrocardiogram (ECG) dataset (Moody 1983) with 10s windows, sampled at 250Hz, and 2 channels.
Dataset Splits	Yes	The training, validation, and testing data splits were initially defined in the TFC work for the EMG, Gesture, FDB, and Epilepsy datasets. These splits were also used for bio FAME and Sim MTM. Thus, to more comprehensively evaluate models used for transfer learning, we advocate for and implement a cross-validation procedure that averages test performance across 10-fold test splits. The training and validation data (the remaining 9 folds) is then reduced to the specific data-regime percentage. This leftover percentage of training and validation data is then split into 75% training and 25% validation data.
Hardware Specification	Yes	Our models are pre-trained and fine-tuned on an AWS instance with 4 NVIDIA A10 GPUs, with [42, 1337, 1212, 9999] as the model seeds, and 42 as the seed for data randomization and fold generation.
Software Dependencies	No	The paper mentions using specific models (bio FAME, Sim MTM, Patch TST) from their official implementations, and details hyperparameters for their own model Ci Trus, but does not specify any software libraries or frameworks with version numbers (e.g., PyTorch 1.9, Python 3.8).
Experiment Setup	Yes	We use a 4-layer transformer (with 64 latent dimensions, 128 feed-forward dimension, and 8 heads), a 3-layer convolutional network with 32 channels in the first residual convolution layer that double every layer (only applicable for Ci Trus), a patch size of 20, a 0.5 masking ratio, and a block masking size of 5 (only applicable for Ci Trus). All models are pre-trained for 200 epochs, a batch size of 128, and with 0.0001 as the learning rate. All models are fine-tuned for 100 epochs with a batch size of 64, without any augmentations, and the same learning rate as during pre-training. We use the last pre-training model checkpoint for fine-tuning, and evaluate the best (based on the validation set) fine-tuning checkpoint on the test set. Our models are pre-trained and fine-tuned on an AWS instance with 4 NVIDIA A10 GPUs, with [42, 1337, 1212, 9999] as the model seeds, and 42 as the seed for data randomization and fold generation.