NuTime: Numerically Multi-Scaled Embedding for Large- Scale Time-Series Pretraining

Authors: Chenguo Lin, Xumeng Wen, Wei Cao, Congrui Huang, Jiang Bian, Stephen Lin, Zhirong Wu

TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We study its transfer performance on a number of univariate and multivariate classification tasks, few shot learning, unsupervised clustering and anomaly detection benchmarks. Our method exhibits remarkable improvement against previous pretraining approaches and establishes the new state of the art, even compared with domain-specific non-learning-based methods.
Researcher Affiliation Collaboration Chenguo Linú EMAIL Peking University Xumeng Wen EMAIL Microsoft Corporation Wei Cao EMAIL Microsoft Corporation Congrui Huang EMAIL Microsoft Corporation Jiang Bian EMAIL Microsoft Corporation Stephen Lin EMAIL Microsoft Corporation Zhirong Wu EMAIL Microsoft Corporation
Pseudocode No The paper describes methods and equations, such as for the Numerically Multi-scaled Embedding, but does not contain a dedicated pseudocode or algorithm block with structured steps.
Open Source Code Yes Code is available at: https://github.com/chenguolin/Nu Time.
Open Datasets Yes To conduct large-scale representation learning, we collect pretraining data by combining existing datasets from various sources, yielding a dataset with over one million time-series sequences. ... (1) The UCR time series archive (Dau et al., 2019), (2) The UEA benchmark (Bagnall et al., 2018) and (3) eight additional datasets used in recent technical papers (Eldele et al., 2021b; Zhang et al., 2022) include: Epilepsy (Andrzejak et al., 2001), Sleep EEG (Kemp et al., 2000), HAR (Anguita et al., 2013), Gesture (Liu et al., 2009), FD-A (Lessmeier et al., 2016), FD-B (Lessmeier et al., 2016), ECG (Clifford et al., 2017) and EMG (Goldberger et al., 2000).
Dataset Splits Yes The original training and testing splits of these datasets are retained, and only the training portions are merged. ... For a fair comparison, we adopt the same training and test split as Zhang et al. (2022), and there are 60 and 13,559 samples in FD-B training and test dataset for classification benchmarking. ... Epilepsy (Andrzejak et al., 2001) ... we use the dataset split by Zhang et al. (2022), having 60 samples for training, 20 samples for validation, and 11,420 samples for testing.
Hardware Specification Yes The pretraining takes 6 hours on 4 V100 GPUs.
Software Dependencies No The paper mentions using a Transformer encoder and AdamW optimizer, but does not provide specific version numbers for any software libraries or dependencies (e.g., PyTorch, TensorFlow, Python version).
Experiment Setup Yes We adopt a 6-layer and 8-head standard Transformer encoder with fixed sinusoidal positional encoding (Vaswani et al., 2017) as the backbone for our experiments. It uses 128-dimensional latent vectors through all of its layers, with 512 dimensions for the MLP hidden layer size. The window size for input patches is 16. For the numerically multi-scaled embedding, we choose to use 9 scales, which range from 10 4 to 104 by factors of 10. ... The learning rate is 2e-3 for a batch size of 2048. The model is trained for a total of 100 epochs with a linear learning rate warm-up in the first 10 epochs of training and a cosine learning rate decay scheduler (Loshchilov & Hutter, 2017) with an end rate of zero. For optimization, we use Adam W (Loshchilov & Hutter, 2018) with 1 = 0.9, 2 = 0.999 and a weight decay of 0.05. For pretraining, we simply choose the data augmentation of random resized crop for the BYOL objective. It randomly crops a sub-sequence from the original data between the range of 80% to 100%, and subsequently resizes the selected sub-sequence to a length of 512 using bilinear interpolation.