reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Scaling Wearable Foundation Models

Authors: Girish Narayanswamy, Xin Liu, Kumar Ayush, Yuzhe Yang, Xuhai Xu, Shun Liao, Jake Garrison, Shyam Tailor, Jacob Sunshine, Yun Liu, Tim Althoff, Shrikanth Narayanan, Pushmeet Kohli, Jiening Zhan, Mark Malhotra, Shwetak Patel, Samy Abdel-Ghaffar, Daniel McDuff

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Using a dataset of up to 40 million hours of in-situ heart rate, heart rate variability, accelerometer, electrodermal activity, skin temperature, and altimeter per-minute data from over 165,000 people, we create LSM, a multimodal foundation model built on the largest wearable-signals dataset with the most extensive range of sensor modalities to date. Our results establish the scaling laws of LSM for tasks such as imputation, interpolation and extrapolation across both time and sensor modalities. Moreover, we highlight how LSM enables sample-efﬁcient downstream learning for tasks including exercise and activity recognition.
Researcher Affiliation	Collaboration	Co-ﬁrst, Corresponding, 1Google Research, 2Google Deep Mind, 3University of Washington EMAIL, EMAIL
Pseudocode	No	The paper describes methods in regular paragraph text and figures, but does not include any structured pseudocode or algorithm blocks labeled 'Pseudocode' or 'Algorithm'.
Open Source Code	No	The paper mentions building methods upon the Scenic project (Dehghani et al., 2022) and provides its GitHub link: 'github.com/google-research/scenic'. However, this refers to a third-party codebase used by the authors, not the explicit release of the source code for the methodology described in this paper (LSM).
Open Datasets	No	We support open science principles and the value of open data for scientiﬁc research; however, we have to balance these considerations with the privacy of the participants and protection of their health data. Although the training data could be de-identiﬁed, some of the data streams could not be fully anonymized. We recognize that the inability to share data of this kind is a limitation; however we believe that the presented results enable us to share valuable insights with the community.
Dataset Splits	Yes	The dataset was split 80-20, based on subjects, into train-test splits (132072 subjects in training, 33018 subjects in testing) as described in Table 2(a).
Hardware Specification	Yes	We pretrain our models on Google v5e TPUs with a total batch size of 4,096 across 50,000 training steps.
Software Dependencies	No	The paper mentions using the Adam W optimizer and building methods upon the Scenic project, implemented in JAX with Flax. However, it does not provide specific version numbers for these software components (e.g., JAX version, Flax version, or the specific AdamW implementation version) or any other libraries.
Experiment Setup	Yes	We pretrain our models on Google v5e TPUs with a total batch size of 4,096 across 50,000 training steps. The training process uses the Adam W optimizer with a base learning rate of 5e 3 and weight decay set to 1e 4. A linear warm-up schedule is applied for the ﬁrst 2,500 steps, followed by a cosine learning rate decay to zero. All pretraining experiments use an 0.8 masking ratio (masking out random patches that cover 80% of the total input signals).