iMoT: Inertial Motion Transformer for Inertial Navigation

Authors: Son Minh Nguyen, Duc Viet Le, Paul Havinga

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive evaluations on various inertial datasets demonstrate that iMoT significantly outperforms state-of-the-art methods in delivering superior robustness and accuracy in trajectory reconstruction.
Researcher Affiliation Academia Son Minh Nguyen, Duc Viet Le, Paul Havinga Department of Computer Science, University of Twente, Enschede, The Netherlands {m.s.nguyen; v.d.le; p.j.m.havinga}@utwente.nl
Pseudocode No The paper describes methods and processes through textual descriptions and diagrams (e.g., Figure 1 and Figure 2), but does not contain a clearly labeled 'Pseudocode' or 'Algorithm' block.
Open Source Code Yes Code https://github.com/Minh-Son-Nguyen/iMoT
Open Datasets Yes Four popular benchmark datasets are used for evaluation: RIDI (Yan, Shan, and Furukawa 2018), Ro NIN (Herath, Yan, and Furukawa 2020), Ox IOD (Chen et al. 2018b), and IDOL (Sun, Melamed, and Kitani 2021).
Dataset Splits No The paper evaluates performance on 'Seen' and 'Unseen' subjects, particularly emphasizing generalization to 'unseen subjects'. However, it does not provide specific percentages, sample counts, or explicit methodology for how the datasets are split into training, validation, and test sets, or how 'seen' and 'unseen' subjects are partitioned for reproducibility.
Hardware Specification Yes The training is performed with Py Torch version 2.4.0 on an H100 GPU with 80 GB of memory.
Software Dependencies Yes The training is performed with Py Torch version 2.4.0 on an H100 GPU with 80 GB of memory.
Experiment Setup Yes To initialize PSD module in the encoder, we implement Avg Poolk2 k1 with k1 and k2 set to 9, 3, respectively. For decoding, we empirically find that using a set of P = 128 query motion particles representing 128 motion modes is sufficient. Depending on the sampling rate of each dataset, the token dimension is set to 100 for IMU sequences recorded at 100 Hz and to 200 for sequences recorded at 200 Hz. The network, consisting of N = 2 encoder layers and M = 2 decoder layers, is trained with the learning rate of 1e 4 and batch size of B = 128 using Adam optimization.