reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

PaPaGei: Open Foundation Models for Optical Physiological Signals

Authors: Arvind Pillai, Dimitris Spathis, Fahim Kawsar, Mohammad Malekzadeh

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate PAPAGEI against state-of-the-art time-series foundation models and self-supervised learning benchmarks across 20 tasks from 10 diverse datasets, spanning cardiovascular health, sleep disorders, pregnancy monitoring, and wellbeing assessment. Our model demonstrates superior performance, improving classification and regression metrics by 6.3% and 2.9% respectively in at least 14 tasks. Notably, PAPAGEI achieves these results while being more dataand parameter-efficient, outperforming models that are 70 larger. Beyond accuracy, we examine model robustness across different skin tones, establishing a benchmark for bias evaluation in future models.
Researcher Affiliation	Collaboration	Arvind Pillai2 , Dimitris Spathis1,3, Fahim Kawsar1,4, Mohammad Malekzadeh1 1Nokia Bell Labs, Cambridge, UK, 2Dartmouth College, NH, USA, 3University of Cambridge, UK, 4University of Glasgow, Scotland, UK
Pseudocode	No	The paper describes its methodology using descriptive text, mathematical equations (1-5), and a flow diagram (Figure 2), but it does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Models, data, and code are available at: github.com/nokia-bell-labs/papagei-foundation-model
Open Datasets	Yes	The model is pre-trained on over 57,000 hours of data, comprising 20 million unlabeled PPG segments from publicly available datasets. ...To our knowledge, PAPAGEI is the first open foundation model pre-trained on PPG signals, using 57,000 hours of data from 20 million signals sourced entirely from public datasets. ...Databases used: Vital DB (Lee et al., 2022), MIMIC-III (Johnson et al., 2016), MESA (Zhang et al., 2018; Chen et al., 2015), nu Mom2B (Facco et al., 2015), VV (Skin Tone) (Toye, 2023), PPG-BP (Liang et al., 2018a), SDB (Garde et al., 2014), ECSMP (Gao et al., 2021), WESAD (Schmidt et al., 2018), PPG-Da Li A (Reiss et al., 2019).
Dataset Splits	Yes	We initially split the in-domain and out-of-domain datasets into training, validation, and test sets using 80/10/10 and 60/20/20 ratios at the participant-level, respectively. Hyperparameter optimization is performed on the training set using nested cross-validation, thus the validation and test sets are merged for evaluation.
Hardware Specification	Yes	We set α = 0.6 and train on eight V100 GPUs for 15,000 steps (lr= 10 4), with PAPAGEI-P and PAPAGEI-S having 5M and 5.7M parameters, respectively, while previous works use model sizes of 3.3M (Abbaspourazad et al., 2023) (we study scaling in Section 5.2).
Software Dependencies	No	For model training, we primarily used Py Torch (Paszke et al., 2019). The NTXent Loss implementation was sourced from the Py Torch Metric Learning package. The paper mentions PyTorch with a citation and a PyTorch Metric Learning package, but does not provide specific version numbers for these software components.
Experiment Setup	Yes	We adopt a Res Net-style CNN encoder, following (Ding et al., 2024). ...Our model has 18 convolutional blocks, starting with a filter size of 32, which doubles every 4 blocks. The projection layer is a single FC layer, generating a 512-d embedding. In the PAPAGEI-S variant, the expert block (M1 & M2) uses three parallel FCNNs, each with two FC layers, resulting in a 128d embedding. For augmentations, PAPAGEI-P uses cropping (0.50), negation (0.20), flipping (0.20), and scaling (0.40). PAPAGEI-S uses cropping (0.25) and Gaussian noise (0.25). ...We set α = 0.6 and train on eight V100 GPUs for 15,000 steps (lr= 10 4), with PAPAGEI-P and PAPAGEI-S having 5M and 5.7M parameters, respectively... We use a batch size of 128 for training...